Overview

Dataset statistics

Number of variables23
Number of observations45542
Missing cells102661
Missing cells (%)9.8%
Duplicate rows52
Duplicate rows (%)0.1%
Total size in memory8.0 MiB
Average record size in memory184.0 B

Variable types

Categorical15
Numeric7
Unsupported1

Alerts

Dataset has 52 (0.1%) duplicate rowsDuplicates
belongs_to_collection has a high cardinality: 1645 distinct valuesHigh cardinality
genres has a high cardinality: 4069 distinct valuesHigh cardinality
id has a high cardinality: 45436 distinct valuesHigh cardinality
original_language has a high cardinality: 92 distinct valuesHigh cardinality
overview has a high cardinality: 44307 distinct valuesHigh cardinality
poster_path has a high cardinality: 45024 distinct valuesHigh cardinality
production_companies has a high cardinality: 22581 distinct valuesHigh cardinality
production_countries has a high cardinality: 2387 distinct valuesHigh cardinality
release_date has a high cardinality: 17333 distinct valuesHigh cardinality
spoken_languages has a high cardinality: 1843 distinct valuesHigh cardinality
tagline has a high cardinality: 20283 distinct valuesHigh cardinality
title has a high cardinality: 42277 distinct valuesHigh cardinality
cast has a high cardinality: 42663 distinct valuesHigh cardinality
crew has a high cardinality: 42899 distinct valuesHigh cardinality
original_language is highly imbalanced (67.6%)Imbalance
production_countries is highly imbalanced (58.4%)Imbalance
spoken_languages is highly imbalanced (61.2%)Imbalance
status is highly imbalanced (96.9%)Imbalance
return has 2035 (4.5%) infinite valuesInfinite
belongs_to_collection has 41039 (90.1%) missing valuesMissing
overview has 954 (2.1%) missing valuesMissing
tagline has 25103 (55.1%) missing valuesMissing
return has 34592 (76.0%) missing valuesMissing
id is uniformly distributedUniform
overview is uniformly distributedUniform
poster_path is uniformly distributedUniform
tagline is uniformly distributedUniform
title is uniformly distributedUniform
popularity is an unsupported type, check if it needs cleaning or further analysisUnsupported
budget has 36627 (80.4%) zerosZeros
revenue has 38114 (83.7%) zerosZeros
runtime has 1559 (3.4%) zerosZeros
vote_average has 3005 (6.6%) zerosZeros
vote_count has 2906 (6.4%) zerosZeros
return has 3522 (7.7%) zerosZeros

Reproduction

Analysis started2023-06-11 00:51:55.777028
Analysis finished2023-06-11 00:56:10.774076
Duration4 minutes and 15 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

belongs_to_collection
Categorical

HIGH CARDINALITY  MISSING 

Distinct1645
Distinct (%)36.5%
Missing41039
Missing (%)90.1%
Memory size355.9 KiB
[]
 
110
['The Bowery Boys']
 
29
['Totò Collection']
 
27
['Pokémon Collection']
 
26
['James Bond Collection']
 
26
Other values (1640)
4285 

Length

Max length58
Median length46
Mean length27.123917
Min length2

Characters and Unicode

Total characters122139
Distinct characters165
Distinct categories12 ?
Distinct scripts7 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique374 ?
Unique (%)8.3%

Sample

1st row['Toy Story Collection']
2nd row['Grumpy Old Men Collection']
3rd row['Father of the Bride Collection']
4th row['James Bond Collection']
5th row['Balto Collection']

Common Values

ValueCountFrequency (%)
[] 110
 
0.2%
['The Bowery Boys'] 29
 
0.1%
['Totò Collection'] 27
 
0.1%
['Pokémon Collection'] 26
 
0.1%
['James Bond Collection'] 26
 
0.1%
['Zatôichi: The Blind Swordsman'] 26
 
0.1%
['The Carry On Collection'] 25
 
0.1%
['Charlie Chan (Sidney Toler) Collection'] 21
 
< 0.1%
['Godzilla (Showa) Collection'] 16
 
< 0.1%
['Charlie Chan (Warner Oland) Collection'] 15
 
< 0.1%
Other values (1635) 4182
 
9.2%
(Missing) 41039
90.1%

Length

2023-06-10T19:56:12.574521image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
collection 3659
25.2%
the 1139
 
7.8%
243
 
1.7%
of 229
 
1.6%
series 146
 
1.0%
and 84
 
0.6%
trilogy 82
 
0.6%
a 60
 
0.4%
man 60
 
0.4%
in 56
 
0.4%
Other values (2316) 8784
60.4%

Most occurring characters

ValueCountFrequency (%)
o 10837
 
8.9%
e 10233
 
8.4%
10040
 
8.2%
l 9935
 
8.1%
' 8786
 
7.2%
i 7360
 
6.0%
n 7222
 
5.9%
t 6333
 
5.2%
c 4743
 
3.9%
[ 4508
 
3.7%
Other values (155) 42142
34.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 79098
64.8%
Uppercase Letter 13562
 
11.1%
Space Separator 10040
 
8.2%
Other Punctuation 9245
 
7.6%
Open Punctuation 4836
 
4.0%
Close Punctuation 4836
 
4.0%
Decimal Number 321
 
0.3%
Dash Punctuation 150
 
0.1%
Other Letter 37
 
< 0.1%
Final Punctuation 9
 
< 0.1%
Other values (2) 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 10837
13.7%
e 10233
12.9%
l 9935
12.6%
i 7360
9.3%
n 7222
9.1%
t 6333
8.0%
c 4743
6.0%
a 4333
 
5.5%
r 3799
 
4.8%
s 2454
 
3.1%
Other values (68) 11849
15.0%
Uppercase Letter
ValueCountFrequency (%)
C 4366
32.2%
T 1502
 
11.1%
S 1053
 
7.8%
B 667
 
4.9%
M 609
 
4.5%
D 501
 
3.7%
A 490
 
3.6%
H 447
 
3.3%
P 419
 
3.1%
G 412
 
3.0%
Other values (33) 3096
22.8%
Other Letter
ValueCountFrequency (%)
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
2
 
5.4%
Other values (4) 8
21.6%
Other Punctuation
ValueCountFrequency (%)
' 8786
95.0%
. 168
 
1.8%
: 99
 
1.1%
, 76
 
0.8%
& 50
 
0.5%
! 34
 
0.4%
/ 21
 
0.2%
* 4
 
< 0.1%
? 4
 
< 0.1%
3
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 80
24.9%
9 64
19.9%
3 54
16.8%
0 51
15.9%
2 21
 
6.5%
8 13
 
4.0%
5 12
 
3.7%
7 11
 
3.4%
6 10
 
3.1%
4 5
 
1.6%
Open Punctuation
ValueCountFrequency (%)
[ 4508
93.2%
( 328
 
6.8%
Close Punctuation
ValueCountFrequency (%)
] 4508
93.2%
) 328
 
6.8%
Dash Punctuation
ValueCountFrequency (%)
- 148
98.7%
2
 
1.3%
Space Separator
ValueCountFrequency (%)
10040
100.0%
Final Punctuation
ValueCountFrequency (%)
9
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%
Other Number
ValueCountFrequency (%)
½ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 92246
75.5%
Common 29442
 
24.1%
Cyrillic 414
 
0.3%
Hiragana 15
 
< 0.1%
Hangul 10
 
< 0.1%
Katakana 9
 
< 0.1%
Han 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 10837
11.7%
e 10233
11.1%
l 9935
10.8%
i 7360
 
8.0%
n 7222
 
7.8%
t 6333
 
6.9%
c 4743
 
5.1%
C 4366
 
4.7%
a 4333
 
4.7%
r 3799
 
4.1%
Other values (69) 23085
25.0%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
К 16
 
3.9%
ц 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Common
ValueCountFrequency (%)
10040
34.1%
' 8786
29.8%
[ 4508
15.3%
] 4508
15.3%
) 328
 
1.1%
( 328
 
1.1%
. 168
 
0.6%
- 148
 
0.5%
: 99
 
0.3%
1 80
 
0.3%
Other values (20) 449
 
1.5%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%
Katakana
ValueCountFrequency (%)
3
33.3%
3
33.3%
3
33.3%
Han
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 121425
99.4%
Cyrillic 414
 
0.3%
None 246
 
0.2%
Hiragana 15
 
< 0.1%
Punctuation 14
 
< 0.1%
Katakana 12
 
< 0.1%
Hangul 10
 
< 0.1%
CJK 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 10837
 
8.9%
e 10233
 
8.4%
10040
 
8.3%
l 9935
 
8.2%
' 8786
 
7.2%
i 7360
 
6.1%
n 7222
 
5.9%
t 6333
 
5.2%
c 4743
 
3.9%
[ 4508
 
3.7%
Other values (67) 41428
34.1%
None
ValueCountFrequency (%)
é 49
19.9%
ä 38
15.4%
ô 35
14.2%
ò 28
11.4%
ö 19
 
7.7%
ó 14
 
5.7%
ı 14
 
5.7%
í 9
 
3.7%
İ 4
 
1.6%
á 4
 
1.6%
Other values (18) 32
13.0%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
К 16
 
3.9%
ц 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Punctuation
ValueCountFrequency (%)
9
64.3%
3
 
21.4%
2
 
14.3%
Katakana
ValueCountFrequency (%)
3
25.0%
3
25.0%
3
25.0%
3
25.0%
CJK
ValueCountFrequency (%)
3
100.0%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

budget
Real number (ℝ)

Distinct1223
Distinct (%)2.7%
Missing3
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean4223191.5
Minimum0
Maximum3.8 × 108
Zeros36627
Zeros (%)80.4%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:13.146454image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile25000000
Maximum3.8 × 108
Range3.8 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17413544
Coefficient of variation (CV)4.1233139
Kurtosis66.822108
Mean4223191.5
Median Absolute Deviation (MAD)0
Skewness7.127262
Sum1.9231992 × 1011
Variance3.0323153 × 1014
MonotonicityNot monotonic
2023-06-10T19:56:14.030844image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 36627
80.4%
5000000 286
 
0.6%
10000000 261
 
0.6%
20000000 243
 
0.5%
2000000 242
 
0.5%
15000000 226
 
0.5%
3000000 223
 
0.5%
25000000 206
 
0.5%
1000000 197
 
0.4%
30000000 192
 
0.4%
Other values (1213) 6836
 
15.0%
ValueCountFrequency (%)
0 36627
80.4%
1 25
 
0.1%
2 14
 
< 0.1%
3 9
 
< 0.1%
4 10
 
< 0.1%
5 8
 
< 0.1%
6 5
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
380000000 1
 
< 0.1%
300000000 1
 
< 0.1%
280000000 1
 
< 0.1%
270000000 1
 
< 0.1%
260000000 3
 
< 0.1%
258000000 1
 
< 0.1%
255000000 1
 
< 0.1%
250000000 10
< 0.1%
245000000 2
 
< 0.1%
237000000 1
 
< 0.1%

genres
Categorical

Distinct4069
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size355.9 KiB
['Drama']
5008 
['Comedy']
3623 
['Documentary']
 
2728
[]
 
2443
['Drama', 'Romance']
 
1303
Other values (4064)
30437 

Length

Max length98
Median length84
Mean length21.601642
Min length2

Characters and Unicode

Total characters983782
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2364 ?
Unique (%)5.2%

Sample

1st row['Animation', 'Comedy', 'Family']
2nd row['Adventure', 'Fantasy', 'Family']
3rd row['Romance', 'Comedy']
4th row['Comedy', 'Drama', 'Romance']
5th row['Comedy']

Common Values

ValueCountFrequency (%)
['Drama'] 5008
 
11.0%
['Comedy'] 3623
 
8.0%
['Documentary'] 2728
 
6.0%
[] 2443
 
5.4%
['Drama', 'Romance'] 1303
 
2.9%
['Comedy', 'Drama'] 1140
 
2.5%
['Horror'] 974
 
2.1%
['Comedy', 'Romance'] 930
 
2.0%
['Comedy', 'Drama', 'Romance'] 593
 
1.3%
['Drama', 'Comedy'] 534
 
1.2%
Other values (4059) 26266
57.7%

Length

2023-06-10T19:56:14.640945image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 20312
20.8%
comedy 13196
13.5%
thriller 7640
 
7.8%
romance 6746
 
6.9%
action 6607
 
6.8%
horror 4679
 
4.8%
crime 4314
 
4.4%
documentary 3937
 
4.0%
adventure 3508
 
3.6%
science 3061
 
3.1%
Other values (37) 23582
24.2%

Most occurring characters

ValueCountFrequency (%)
' 182588
18.6%
r 69270
 
7.0%
a 62006
 
6.3%
e 55949
 
5.7%
m 53232
 
5.4%
52040
 
5.3%
o 48661
 
4.9%
, 48195
 
4.9%
[ 45542
 
4.6%
] 45542
 
4.6%
Other values (33) 320757
32.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 513960
52.2%
Other Punctuation 230783
23.5%
Uppercase Letter 95915
 
9.7%
Space Separator 52040
 
5.3%
Open Punctuation 45542
 
4.6%
Close Punctuation 45542
 
4.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 69270
13.5%
a 62006
12.1%
e 55949
10.9%
m 53232
10.4%
o 48661
9.5%
i 39819
7.7%
n 35814
7.0%
y 28585
5.6%
c 28080
5.5%
t 26310
 
5.1%
Other values (12) 66234
12.9%
Uppercase Letter
ValueCountFrequency (%)
D 24249
25.3%
C 17513
18.3%
A 12062
12.6%
F 9789
10.2%
T 8413
 
8.8%
R 6748
 
7.0%
H 6078
 
6.3%
M 4848
 
5.1%
S 3065
 
3.2%
W 2367
 
2.5%
Other values (6) 783
 
0.8%
Other Punctuation
ValueCountFrequency (%)
' 182588
79.1%
, 48195
 
20.9%
Space Separator
ValueCountFrequency (%)
52040
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 45542
100.0%
Close Punctuation
ValueCountFrequency (%)
] 45542
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 609875
62.0%
Common 373907
38.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 69270
11.4%
a 62006
 
10.2%
e 55949
 
9.2%
m 53232
 
8.7%
o 48661
 
8.0%
i 39819
 
6.5%
n 35814
 
5.9%
y 28585
 
4.7%
c 28080
 
4.6%
t 26310
 
4.3%
Other values (28) 162149
26.6%
Common
ValueCountFrequency (%)
' 182588
48.8%
52040
 
13.9%
, 48195
 
12.9%
[ 45542
 
12.2%
] 45542
 
12.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 983782
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 182588
18.6%
r 69270
 
7.0%
a 62006
 
6.3%
e 55949
 
5.7%
m 53232
 
5.4%
52040
 
5.3%
o 48661
 
4.9%
, 48195
 
4.9%
[ 45542
 
4.6%
] 45542
 
4.6%
Other values (33) 320757
32.6%

id
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct45436
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size355.9 KiB
141971
 
9
14788
 
4
18440
 
4
15028
 
4
13209
 
4
Other values (45431)
45517 

Length

Max length10
Median length5
Mean length5.2516359
Min length1

Characters and Unicode

Total characters239170
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45393 ?
Unique (%)99.7%

Sample

1st row862
2nd row8844
3rd row15602
4th row31357
5th row11862

Common Values

ValueCountFrequency (%)
141971 9
 
< 0.1%
14788 4
 
< 0.1%
18440 4
 
< 0.1%
15028 4
 
< 0.1%
13209 4
 
< 0.1%
12600 4
 
< 0.1%
4912 4
 
< 0.1%
10991 4
 
< 0.1%
99080 4
 
< 0.1%
152795 4
 
< 0.1%
Other values (45426) 45497
99.9%

Length

2023-06-10T19:56:15.137726image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
141971 9
 
< 0.1%
42495 4
 
< 0.1%
14788 4
 
< 0.1%
69234 4
 
< 0.1%
265189 4
 
< 0.1%
119916 4
 
< 0.1%
109962 4
 
< 0.1%
25541 4
 
< 0.1%
132641 4
 
< 0.1%
11115 4
 
< 0.1%
Other values (45426) 45497
99.9%

Most occurring characters

ValueCountFrequency (%)
1 33016
13.8%
2 28673
12.0%
3 26752
11.2%
4 24787
10.4%
5 22038
9.2%
6 21207
8.9%
7 20975
8.8%
8 20938
8.8%
9 20540
8.6%
0 20238
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 239164
> 99.9%
Dash Punctuation 6
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 33016
13.8%
2 28673
12.0%
3 26752
11.2%
4 24787
10.4%
5 22038
9.2%
6 21207
8.9%
7 20975
8.8%
8 20938
8.8%
9 20540
8.6%
0 20238
8.5%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 239170
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 33016
13.8%
2 28673
12.0%
3 26752
11.2%
4 24787
10.4%
5 22038
9.2%
6 21207
8.9%
7 20975
8.8%
8 20938
8.8%
9 20540
8.6%
0 20238
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 239170
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 33016
13.8%
2 28673
12.0%
3 26752
11.2%
4 24787
10.4%
5 22038
9.2%
6 21207
8.9%
7 20975
8.8%
8 20938
8.8%
9 20540
8.6%
0 20238
8.5%

original_language
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct92
Distinct (%)0.2%
Missing11
Missing (%)< 0.1%
Memory size355.9 KiB
en
32316 
fr
 
2443
it
 
1529
ja
 
1356
de
 
1083
Other values (87)
6804 

Length

Max length5
Median length2
Mean length2.0001537
Min length2

Characters and Unicode

Total characters91069
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en 32316
71.0%
fr 2443
 
5.4%
it 1529
 
3.4%
ja 1356
 
3.0%
de 1083
 
2.4%
es 994
 
2.2%
ru 826
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 409
 
0.9%
Other values (82) 3623
 
8.0%

Length

2023-06-10T19:56:15.508972image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en 32316
71.0%
fr 2443
 
5.4%
it 1529
 
3.4%
ja 1356
 
3.0%
de 1083
 
2.4%
es 994
 
2.2%
ru 826
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 409
 
0.9%
Other values (82) 3623
 
8.0%

Most occurring characters

ValueCountFrequency (%)
e 34648
38.0%
n 33025
36.3%
r 3641
 
4.0%
f 2852
 
3.1%
i 2397
 
2.6%
t 2254
 
2.5%
a 1851
 
2.0%
s 1656
 
1.8%
j 1357
 
1.5%
d 1330
 
1.5%
Other values (23) 6058
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 91056
> 99.9%
Decimal Number 10
 
< 0.1%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34648
38.1%
n 33025
36.3%
r 3641
 
4.0%
f 2852
 
3.1%
i 2397
 
2.6%
t 2254
 
2.5%
a 1851
 
2.0%
s 1656
 
1.8%
j 1357
 
1.5%
d 1330
 
1.5%
Other values (16) 6045
 
6.6%
Decimal Number
ValueCountFrequency (%)
0 4
40.0%
8 2
20.0%
2 1
 
10.0%
6 1
 
10.0%
1 1
 
10.0%
4 1
 
10.0%
Other Punctuation
ValueCountFrequency (%)
. 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 91056
> 99.9%
Common 13
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34648
38.1%
n 33025
36.3%
r 3641
 
4.0%
f 2852
 
3.1%
i 2397
 
2.6%
t 2254
 
2.5%
a 1851
 
2.0%
s 1656
 
1.8%
j 1357
 
1.5%
d 1330
 
1.5%
Other values (16) 6045
 
6.6%
Common
ValueCountFrequency (%)
0 4
30.8%
. 3
23.1%
8 2
15.4%
2 1
 
7.7%
6 1
 
7.7%
1 1
 
7.7%
4 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 91069
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34648
38.0%
n 33025
36.3%
r 3641
 
4.0%
f 2852
 
3.1%
i 2397
 
2.6%
t 2254
 
2.5%
a 1851
 
2.0%
s 1656
 
1.8%
j 1357
 
1.5%
d 1330
 
1.5%
Other values (23) 6058
 
6.7%

overview
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct44307
Distinct (%)99.4%
Missing954
Missing (%)2.1%
Memory size355.9 KiB
No overview found.
 
133
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia.
 
9
No Overview
 
7
 
5
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers.
 
5
Other values (44302)
44429 

Length

Max length1000
Median length785
Mean length323.36072
Min length1

Characters and Unicode

Total characters14418008
Distinct characters429
Distinct categories25 ?
Distinct scripts13 ?
Distinct blocks21 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44233 ?
Unique (%)99.2%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.

Common Values

ValueCountFrequency (%)
No overview found. 133
 
0.3%
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. 9
 
< 0.1%
No Overview 7
 
< 0.1%
5
 
< 0.1%
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers. 5
 
< 0.1%
Prospero, the true Duke of Milan is now living on an enchanted island with his daughter Miranda, the savage Caliban and Ariel, a spirit of the air. Raising a sorm to bring his brother - the usurper of his dukedom - along with his royal entourage. to the island. Prospero contrives his revenge. 4
 
< 0.1%
East-Berlin, 1961, shortly after the erection of the Wall. Konrad, Sophie and three of their friends plan a daring escape to Western Germany. The attempt is successful, except for Konrad, who remains behind. From then on, and for the next 28 years, Konrad and Sophie will attempt to meet again, in spite of the Iron Curtain. Konrad, who has become a reputed Astrophysicist, tries to take advantage of scientific congresses outside Eastern Germany to arrange encounters with Sophie. But in a country where the political police, the Stasi, monitors the moves of all suspicious people (such as Konrad's sister Barbara and her husband Harald), preserving one's privacy, ideals and self-respect becomes an exhausting fight, even as the Eastern block begins its long process of disintegration. 4
 
< 0.1%
Since women are banned from soccer matches, Iranian females masquerade as males so they can slip into Tehran's stadium to see the game between Iran and Bahrain. The ones who are caught and arrested are taken to a holding area and guarded by soldiers. One sympathetic soldier agrees to watch the game through a peephole and recount the action to the impatient fans. 4
 
< 0.1%
Two literary women compete for 20 years: one writes for the critics; the other one, to get rich. 4
 
< 0.1%
In a hospital, ten soldiers are being treated for a mysterious sleeping sickness. In a story in which dreams can be experienced by others, and in which goddesses can sit casually with mortals, a nurse learns the reason why the patients will never be cured, and forms a telepathic bond with one of them. 4
 
< 0.1%
Other values (44297) 44409
97.5%
(Missing) 954
 
2.1%

Length

2023-06-10T19:56:16.026965image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 138629
 
5.6%
a 99198
 
4.0%
and 75560
 
3.1%
to 73582
 
3.0%
of 69846
 
2.8%
in 48314
 
2.0%
is 36601
 
1.5%
his 36290
 
1.5%
with 23983
 
1.0%
her 21568
 
0.9%
Other values (97181) 1833961
74.6%

Most occurring characters

ValueCountFrequency (%)
2415028
16.8%
e 1368592
 
9.5%
a 944079
 
6.5%
t 938160
 
6.5%
i 854670
 
5.9%
o 832905
 
5.8%
n 825681
 
5.7%
s 770623
 
5.3%
r 747089
 
5.2%
h 602913
 
4.2%
Other values (419) 4118268
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11190655
77.6%
Space Separator 2415066
 
16.8%
Uppercase Letter 392491
 
2.7%
Other Punctuation 313935
 
2.2%
Decimal Number 42400
 
0.3%
Dash Punctuation 36898
 
0.3%
Close Punctuation 10127
 
0.1%
Open Punctuation 10105
 
0.1%
Final Punctuation 4574
 
< 0.1%
Initial Punctuation 888
 
< 0.1%
Other values (15) 869
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1368592
12.2%
a 944079
 
8.4%
t 938160
 
8.4%
i 854670
 
7.6%
o 832905
 
7.4%
n 825681
 
7.4%
s 770623
 
6.9%
r 747089
 
6.7%
h 602913
 
5.4%
l 480600
 
4.3%
Other values (142) 2825343
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 42898
 
10.9%
T 36105
 
9.2%
S 31263
 
8.0%
M 24037
 
6.1%
B 23794
 
6.1%
C 22904
 
5.8%
H 19496
 
5.0%
W 18730
 
4.8%
I 16876
 
4.3%
D 16361
 
4.2%
Other values (77) 140027
35.7%
Other Letter
ValueCountFrequency (%)
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
2
 
1.6%
Other values (76) 88
70.4%
Other Punctuation
ValueCountFrequency (%)
, 133945
42.7%
. 125208
39.9%
' 31228
 
9.9%
" 11701
 
3.7%
: 3316
 
1.1%
? 2766
 
0.9%
; 2499
 
0.8%
! 1552
 
0.5%
/ 769
 
0.2%
& 457
 
0.1%
Other values (12) 494
 
0.2%
Nonspacing Mark
ValueCountFrequency (%)
ి 4
12.1%
́ 4
12.1%
̈ 3
9.1%
3
9.1%
3
9.1%
3
9.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
Other values (4) 5
15.2%
Decimal Number
ValueCountFrequency (%)
1 9792
23.1%
0 8300
19.6%
9 6434
15.2%
2 4270
10.1%
5 2448
 
5.8%
8 2386
 
5.6%
3 2357
 
5.6%
4 2188
 
5.2%
7 2135
 
5.0%
6 2090
 
4.9%
Spacing Mark
ValueCountFrequency (%)
11
40.7%
4
 
14.8%
3
 
11.1%
3
 
11.1%
ि 2
 
7.4%
2
 
7.4%
1
 
3.7%
ி 1
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 35371
95.9%
885
 
2.4%
633
 
1.7%
5
 
< 0.1%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
® 45
70.3%
14
 
21.9%
° 2
 
3.1%
¦ 2
 
3.1%
1
 
1.6%
Math Symbol
ValueCountFrequency (%)
~ 20
46.5%
+ 12
27.9%
= 6
 
14.0%
| 4
 
9.3%
1
 
2.3%
Open Punctuation
ValueCountFrequency (%)
( 10051
99.5%
[ 51
 
0.5%
{ 2
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 318
96.4%
£ 10
 
3.0%
1
 
0.3%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
2415028
> 99.9%
  36
 
< 0.1%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 10075
99.5%
] 50
 
0.5%
} 2
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
3860
84.4%
695
 
15.2%
» 19
 
0.4%
Initial Punctuation
ValueCountFrequency (%)
677
76.2%
193
 
21.7%
« 18
 
2.0%
Control
ValueCountFrequency (%)
106
96.4%
’ 3
 
2.7%
 1
 
0.9%
Modifier Symbol
ValueCountFrequency (%)
´ 25
65.8%
` 12
31.6%
¯ 1
 
2.6%
Format
ValueCountFrequency (%)
31
60.8%
­ 20
39.2%
Other Number
ValueCountFrequency (%)
½ 8
50.0%
¹ 8
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Line Separator
ValueCountFrequency (%)
7
100.0%
Paragraph Separator
ValueCountFrequency (%)
2
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
ʼ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11577914
80.3%
Common 2834675
 
19.7%
Cyrillic 4587
 
< 0.1%
Greek 648
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Han 10
 
< 0.1%
Hangul 9
 
< 0.1%
Other values (3) 19
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1368592
11.8%
a 944079
 
8.2%
t 938160
 
8.1%
i 854670
 
7.4%
o 832905
 
7.2%
n 825681
 
7.1%
s 770623
 
6.7%
r 747089
 
6.5%
h 602913
 
5.2%
l 480600
 
4.2%
Other values (132) 3212602
27.7%
Common
ValueCountFrequency (%)
2415028
85.2%
, 133945
 
4.7%
. 125208
 
4.4%
- 35371
 
1.2%
' 31228
 
1.1%
" 11701
 
0.4%
) 10075
 
0.4%
( 10051
 
0.4%
1 9792
 
0.3%
0 8300
 
0.3%
Other values (71) 43976
 
1.6%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Greek
ValueCountFrequency (%)
α 60
 
9.3%
ο 55
 
8.5%
τ 43
 
6.6%
ι 36
 
5.6%
η 36
 
5.6%
ν 34
 
5.2%
ε 31
 
4.8%
ρ 31
 
4.8%
ς 30
 
4.6%
π 30
 
4.6%
Other values (33) 262
40.4%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
ி 1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Inherited
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14399960
99.9%
Punctuation 7299
 
0.1%
None 5951
 
< 0.1%
Cyrillic 4587
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Letterlike Symbols 14
 
< 0.1%
CJK 10
 
< 0.1%
Other values (11) 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2415028
16.8%
e 1368592
 
9.5%
a 944079
 
6.6%
t 938160
 
6.5%
i 854670
 
5.9%
o 832905
 
5.8%
n 825681
 
5.7%
s 770623
 
5.4%
r 747089
 
5.2%
h 602913
 
4.2%
Other values (82) 4100220
28.5%
Punctuation
ValueCountFrequency (%)
3860
52.9%
885
 
12.1%
695
 
9.5%
677
 
9.3%
633
 
8.7%
304
 
4.2%
193
 
2.6%
31
 
0.4%
7
 
0.1%
5
 
0.1%
Other values (4) 9
 
0.1%
None
ValueCountFrequency (%)
é 1568
26.3%
ä 294
 
4.9%
á 293
 
4.9%
ö 250
 
4.2%
í 244
 
4.1%
è 209
 
3.5%
ü 178
 
3.0%
ı 165
 
2.8%
ó 164
 
2.8%
ç 158
 
2.7%
Other values (141) 2428
40.8%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Letterlike Symbols
ValueCountFrequency (%)
14
100.0%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Diacriticals
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%
Alphabetic PF
ValueCountFrequency (%)
4
100.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
ி 1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Number Forms
ValueCountFrequency (%)
2
100.0%
Modifier Letters
ValueCountFrequency (%)
ʼ 2
100.0%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%
Currency Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Specials
ValueCountFrequency (%)
1
100.0%

popularity
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)< 0.1%
Memory size355.9 KiB

poster_path
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct45024
Distinct (%)99.7%
Missing386
Missing (%)0.8%
Memory size355.9 KiB
/8VSZ9coCzxOCW2wE2Qene1H1fKO.jpg
 
9
/5D7UBSEgdyONE6Lql6xS7s6OLcW.jpg
 
5
/5GasjPRAy5rlEyDOH7MeOyxyQGX.jpg
 
4
/q19Q5BRZpMXoNCA4OYodVozfjUh.jpg
 
4
/sGMPDg6je1zKi0TiX9b4pP6yN02.jpg
 
4
Other values (45019)
45130 

Length

Max length35
Median length32
Mean length31.971676
Min length12

Characters and Unicode

Total characters1443713
Distinct characters66
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44963 ?
Unique (%)99.6%

Sample

1st row/rhIRbceoE9lR4veEXuwCC2wARtG.jpg
2nd row/vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg
3rd row/6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg
4th row/16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg
5th row/e64sOI48hQXyru7naBFyssKFxVd.jpg

Common Values

ValueCountFrequency (%)
/8VSZ9coCzxOCW2wE2Qene1H1fKO.jpg 9
 
< 0.1%
/5D7UBSEgdyONE6Lql6xS7s6OLcW.jpg 5
 
< 0.1%
/5GasjPRAy5rlEyDOH7MeOyxyQGX.jpg 4
 
< 0.1%
/q19Q5BRZpMXoNCA4OYodVozfjUh.jpg 4
 
< 0.1%
/sGMPDg6je1zKi0TiX9b4pP6yN02.jpg 4
 
< 0.1%
/z9WiHt5uQjs8L8tyBpRBKzlheF2.jpg 4
 
< 0.1%
/gLVRTxaLtUDkfscFKPyYrCtRnTk.jpg 4
 
< 0.1%
/nfkOkpudNNIjRrf0mTFVoiGzHyc.jpg 4
 
< 0.1%
/jn8L1QdWWX5c0NUOLjzaSXtZrbt.jpg 4
 
< 0.1%
/xGhDPrBz9mJN8CsIjA23jQSd3sc.jpg 4
 
< 0.1%
Other values (45014) 45110
99.1%
(Missing) 386
 
0.8%

Length

2023-06-10T19:56:16.529022image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
8vsz9coczxocw2we2qene1h1fko.jpg 9
 
< 0.1%
5d7ubsegdyone6lql6xs7s6olcw.jpg 5
 
< 0.1%
nnkx3ahyot7p3au92dnglf4pkwa.jpg 4
 
< 0.1%
qenjwrvw9itr5pvp4cbkyfhvaop.jpg 4
 
< 0.1%
qw1oqlohizrhxzqrpkimyr0oxzn.jpg 4
 
< 0.1%
twcykxhusrqdlavneevjbnhf1yv.jpg 4
 
< 0.1%
iqd7zwhsece3cgdpclidxjgfdzl.jpg 4
 
< 0.1%
k0mf0iibj2pfoiku2kyraxl72d8.jpg 4
 
< 0.1%
5iljs6xb5deihop8sxpsyxxwvpe.jpg 4
 
< 0.1%
w56oo9nrecf54snxvyue9qxzfjt.jpg 4
 
< 0.1%
Other values (45020) 45116
99.9%

Most occurring characters

ValueCountFrequency (%)
g 65406
 
4.5%
p 65258
 
4.5%
j 65164
 
4.5%
. 45153
 
3.1%
/ 45153
 
3.1%
v 20463
 
1.4%
d 20360
 
1.4%
m 20349
 
1.4%
t 20290
 
1.4%
q 20276
 
1.4%
Other values (56) 1055841
73.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 660240
45.7%
Uppercase Letter 492998
34.1%
Decimal Number 200162
 
13.9%
Other Punctuation 90307
 
6.3%
Space Separator 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g 65406
 
9.9%
p 65258
 
9.9%
j 65164
 
9.9%
v 20463
 
3.1%
d 20360
 
3.1%
m 20349
 
3.1%
t 20290
 
3.1%
q 20276
 
3.1%
n 20270
 
3.1%
l 20258
 
3.1%
Other values (16) 322146
48.8%
Uppercase Letter
ValueCountFrequency (%)
A 19428
 
3.9%
R 19224
 
3.9%
M 19204
 
3.9%
C 19194
 
3.9%
W 19182
 
3.9%
V 19176
 
3.9%
T 19005
 
3.9%
K 19004
 
3.9%
L 19002
 
3.9%
D 18970
 
3.8%
Other values (16) 301609
61.2%
Decimal Number
ValueCountFrequency (%)
1 20254
10.1%
8 20250
10.1%
3 20187
10.1%
9 20145
10.1%
5 20138
10.1%
2 20092
10.0%
6 20033
10.0%
4 20033
10.0%
7 19923
10.0%
0 19107
9.5%
Other Punctuation
ValueCountFrequency (%)
. 45153
50.0%
/ 45153
50.0%
: 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1153238
79.9%
Common 290475
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
g 65406
 
5.7%
p 65258
 
5.7%
j 65164
 
5.7%
v 20463
 
1.8%
d 20360
 
1.8%
m 20349
 
1.8%
t 20290
 
1.8%
q 20276
 
1.8%
n 20270
 
1.8%
l 20258
 
1.8%
Other values (42) 815144
70.7%
Common
ValueCountFrequency (%)
. 45153
15.5%
/ 45153
15.5%
1 20254
7.0%
8 20250
7.0%
3 20187
6.9%
9 20145
6.9%
5 20138
6.9%
2 20092
6.9%
6 20033
6.9%
4 20033
6.9%
Other values (4) 39037
13.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1443713
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
g 65406
 
4.5%
p 65258
 
4.5%
j 65164
 
4.5%
. 45153
 
3.1%
/ 45153
 
3.1%
v 20463
 
1.4%
d 20360
 
1.4%
m 20349
 
1.4%
t 20290
 
1.4%
q 20276
 
1.4%
Other values (56) 1055841
73.1%
Distinct22581
Distinct (%)49.6%
Missing3
Missing (%)< 0.1%
Memory size355.9 KiB
[]
11958 
['Metro-Goldwyn-Mayer (MGM)']
 
772
['Warner Bros.']
 
540
['Paramount Pictures']
 
507
['Twentieth Century Fox Film Corporation']
 
441
Other values (22576)
31321 

Length

Max length663
Median length489
Mean length35.501109
Min length2

Characters and Unicode

Total characters1616685
Distinct characters291
Distinct categories15 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20216 ?
Unique (%)44.4%

Sample

1st row['Pixar Animation Studios']
2nd row['TriStar Pictures', 'Teitler Film', 'Interscope Communications']
3rd row['Warner Bros.', 'Lancaster Gate']
4th row['Twentieth Century Fox Film Corporation']
5th row['Sandollar Productions', 'Touchstone Pictures']

Common Values

ValueCountFrequency (%)
[] 11958
 
26.3%
['Metro-Goldwyn-Mayer (MGM)'] 772
 
1.7%
['Warner Bros.'] 540
 
1.2%
['Paramount Pictures'] 507
 
1.1%
['Twentieth Century Fox Film Corporation'] 441
 
1.0%
['Universal Pictures'] 322
 
0.7%
['RKO Radio Pictures'] 247
 
0.5%
['Columbia Pictures Corporation'] 207
 
0.5%
['Columbia Pictures'] 147
 
0.3%
['Mosfilm'] 145
 
0.3%
Other values (22571) 30253
66.4%

Length

2023-06-10T19:56:17.048022image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12801
 
6.8%
films 9400
 
5.0%
pictures 9274
 
4.9%
productions 9005
 
4.8%
film 6673
 
3.5%
entertainment 5149
 
2.7%
corporation 2190
 
1.2%
company 1749
 
0.9%
warner 1478
 
0.8%
bros 1411
 
0.7%
Other values (18395) 129356
68.6%

Most occurring characters

ValueCountFrequency (%)
142959
 
8.8%
' 140576
 
8.7%
i 106400
 
6.6%
e 93948
 
5.8%
n 89480
 
5.5%
o 84817
 
5.2%
r 83218
 
5.1%
t 83083
 
5.1%
a 76920
 
4.8%
s 62156
 
3.8%
Other values (281) 653128
40.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 981535
60.7%
Uppercase Letter 197983
 
12.2%
Other Punctuation 185006
 
11.4%
Space Separator 142959
 
8.8%
Open Punctuation 49846
 
3.1%
Close Punctuation 49845
 
3.1%
Decimal Number 4357
 
0.3%
Dash Punctuation 4308
 
0.3%
Math Symbol 666
 
< 0.1%
Other Letter 140
 
< 0.1%
Other values (5) 40
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 106400
10.8%
e 93948
9.6%
n 89480
9.1%
o 84817
8.6%
r 83218
8.5%
t 83083
8.5%
a 76920
 
7.8%
s 62156
 
6.3%
l 50877
 
5.2%
m 44113
 
4.5%
Other values (102) 206523
21.0%
Other Letter
ValueCountFrequency (%)
9
 
6.4%
8
 
5.7%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
4
 
2.9%
3
 
2.1%
Other values (62) 85
60.7%
Uppercase Letter
ValueCountFrequency (%)
P 27812
14.0%
F 26283
13.3%
C 20428
 
10.3%
M 13340
 
6.7%
S 11881
 
6.0%
E 9684
 
4.9%
A 9426
 
4.8%
T 9352
 
4.7%
B 8966
 
4.5%
G 7806
 
3.9%
Other values (52) 53005
26.8%
Other Punctuation
ValueCountFrequency (%)
' 140576
76.0%
, 37114
 
20.1%
. 5668
 
3.1%
& 764
 
0.4%
/ 648
 
0.4%
" 133
 
0.1%
! 36
 
< 0.1%
\ 24
 
< 0.1%
% 18
 
< 0.1%
: 9
 
< 0.1%
Other values (6) 16
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 1041
23.9%
1 716
16.4%
0 652
15.0%
3 558
12.8%
4 482
11.1%
9 204
 
4.7%
6 197
 
4.5%
7 174
 
4.0%
5 171
 
3.9%
8 162
 
3.7%
Open Punctuation
ValueCountFrequency (%)
[ 45548
91.4%
( 4297
 
8.6%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
] 45548
91.4%
) 4296
 
8.6%
1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 4306
> 99.9%
2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 665
99.8%
| 1
 
0.2%
Other Symbol
ValueCountFrequency (%)
° 23
92.0%
2
 
8.0%
Final Punctuation
ValueCountFrequency (%)
3
50.0%
» 3
50.0%
Other Number
ValueCountFrequency (%)
½ 1
50.0%
² 1
50.0%
Space Separator
ValueCountFrequency (%)
142959
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%
Initial Punctuation
ValueCountFrequency (%)
« 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179115
72.9%
Common 437025
 
27.0%
Cyrillic 373
 
< 0.1%
Hangul 115
 
< 0.1%
Greek 31
 
< 0.1%
Han 26
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 106400
 
9.0%
e 93948
 
8.0%
n 89480
 
7.6%
o 84817
 
7.2%
r 83218
 
7.1%
t 83083
 
7.0%
a 76920
 
6.5%
s 62156
 
5.3%
l 50877
 
4.3%
m 44113
 
3.7%
Other values (99) 404103
34.3%
Hangul
ValueCountFrequency (%)
9
 
7.8%
8
 
7.0%
6
 
5.2%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
4
 
3.5%
3
 
2.6%
Other values (43) 60
52.2%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
ь 16
 
4.3%
е 16
 
4.3%
с 16
 
4.3%
Other values (36) 159
42.6%
Common
ValueCountFrequency (%)
142959
32.7%
' 140576
32.2%
[ 45548
 
10.4%
] 45548
 
10.4%
, 37114
 
8.5%
. 5668
 
1.3%
- 4306
 
1.0%
( 4297
 
1.0%
) 4296
 
1.0%
2 1041
 
0.2%
Other values (34) 5672
 
1.3%
Greek
ValueCountFrequency (%)
ν 3
 
9.7%
ο 3
 
9.7%
ρ 2
 
6.5%
τ 2
 
6.5%
ι 2
 
6.5%
η 2
 
6.5%
λ 2
 
6.5%
Ε 2
 
6.5%
Κ 2
 
6.5%
γ 1
 
3.2%
Other values (10) 10
32.3%
Han
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1610607
99.6%
None 5560
 
0.3%
Cyrillic 373
 
< 0.1%
Hangul 113
 
< 0.1%
CJK 26
 
< 0.1%
Punctuation 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
142959
 
8.9%
' 140576
 
8.7%
i 106400
 
6.6%
e 93948
 
5.8%
n 89480
 
5.6%
o 84817
 
5.3%
r 83218
 
5.2%
t 83083
 
5.2%
a 76920
 
4.8%
s 62156
 
3.9%
Other values (76) 647050
40.2%
None
ValueCountFrequency (%)
é 3058
55.0%
ó 416
 
7.5%
á 317
 
5.7%
í 172
 
3.1%
ñ 150
 
2.7%
ü 148
 
2.7%
ä 139
 
2.5%
ö 134
 
2.4%
è 129
 
2.3%
ô 128
 
2.3%
Other values (75) 769
 
13.8%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
ь 16
 
4.3%
е 16
 
4.3%
с 16
 
4.3%
Other values (36) 159
42.6%
Hangul
ValueCountFrequency (%)
9
 
8.0%
8
 
7.1%
6
 
5.3%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
4
 
3.5%
3
 
2.7%
Other values (42) 58
51.3%
Punctuation
ValueCountFrequency (%)
3
50.0%
2
33.3%
1
 
16.7%
CJK
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

production_countries
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct2387
Distinct (%)5.2%
Missing3
Missing (%)< 0.1%
Memory size355.9 KiB
['United States of America']
17873 
[]
6295 
['United Kingdom']
2241 
['France']
 
1657
['Japan']
 
1360
Other values (2382)
16113 

Length

Max length289
Median length199
Mean length20.591515
Min length2

Characters and Unicode

Total characters937717
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1761 ?
Unique (%)3.9%

Sample

1st row['United States of America']
2nd row['United States of America']
3rd row['United States of America']
4th row['United States of America']
5th row['United States of America']

Common Values

ValueCountFrequency (%)
['United States of America'] 17873
39.2%
[] 6295
 
13.8%
['United Kingdom'] 2241
 
4.9%
['France'] 1657
 
3.6%
['Japan'] 1360
 
3.0%
['Italy'] 1030
 
2.3%
['Canada'] 842
 
1.8%
['Germany'] 752
 
1.7%
['India'] 735
 
1.6%
['Russia'] 735
 
1.6%
Other values (2377) 12019
26.4%

Length

2023-06-10T19:56:17.691534image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 25313
20.2%
states 21183
16.9%
of 21182
16.9%
america 21182
16.9%
6295
 
5.0%
kingdom 4103
 
3.3%
france 3957
 
3.2%
germany 2272
 
1.8%
italy 2175
 
1.7%
canada 1766
 
1.4%
Other values (173) 15864
12.7%

Most occurring characters

ValueCountFrequency (%)
' 99086
 
10.6%
e 80807
 
8.6%
79753
 
8.5%
t 72742
 
7.8%
a 70650
 
7.5%
i 58657
 
6.3%
n 47625
 
5.1%
] 45539
 
4.9%
[ 45539
 
4.9%
d 34624
 
3.7%
Other values (45) 302695
32.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 559726
59.7%
Other Punctuation 109385
 
11.7%
Uppercase Letter 97775
 
10.4%
Space Separator 79753
 
8.5%
Close Punctuation 45539
 
4.9%
Open Punctuation 45539
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 80807
14.4%
t 72742
13.0%
a 70650
12.6%
i 58657
10.5%
n 47625
8.5%
d 34624
6.2%
r 32569
5.8%
o 29629
 
5.3%
m 28768
 
5.1%
c 26417
 
4.7%
Other values (16) 77238
13.8%
Uppercase Letter
ValueCountFrequency (%)
U 25414
26.0%
S 23880
24.4%
A 22424
22.9%
K 5232
 
5.4%
F 4358
 
4.5%
I 3598
 
3.7%
C 2593
 
2.7%
G 2485
 
2.5%
J 1670
 
1.7%
R 1305
 
1.3%
Other values (14) 4816
 
4.9%
Other Punctuation
ValueCountFrequency (%)
' 99086
90.6%
, 10299
 
9.4%
Space Separator
ValueCountFrequency (%)
79753
100.0%
Close Punctuation
ValueCountFrequency (%)
] 45539
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 45539
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 657501
70.1%
Common 280216
29.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 80807
12.3%
t 72742
11.1%
a 70650
10.7%
i 58657
 
8.9%
n 47625
 
7.2%
d 34624
 
5.3%
r 32569
 
5.0%
o 29629
 
4.5%
m 28768
 
4.4%
c 26417
 
4.0%
Other values (40) 175013
26.6%
Common
ValueCountFrequency (%)
' 99086
35.4%
79753
28.5%
] 45539
16.3%
[ 45539
16.3%
, 10299
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 937717
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 99086
 
10.6%
e 80807
 
8.6%
79753
 
8.5%
t 72742
 
7.8%
a 70650
 
7.5%
i 58657
 
6.3%
n 47625
 
5.1%
] 45539
 
4.9%
[ 45539
 
4.9%
d 34624
 
3.7%
Other values (45) 302695
32.3%

release_date
Categorical

Distinct17333
Distinct (%)38.1%
Missing90
Missing (%)0.2%
Memory size355.9 KiB
2008-01-01
 
136
2009-01-01
 
121
2007-01-01
 
120
2005-01-01
 
111
2006-01-01
 
101
Other values (17328)
44863 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters454520
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8569 ?
Unique (%)18.9%

Sample

1st row1995-10-30
2nd row1995-12-15
3rd row1995-12-22
4th row1995-12-22
5th row1995-02-10

Common Values

ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 120
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 70
 
0.2%
Other values (17323) 44447
97.6%
(Missing) 90
 
0.2%

Length

2023-06-10T19:56:18.161632image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 120
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 70
 
0.2%
Other values (17323) 44447
97.8%

Most occurring characters

ValueCountFrequency (%)
0 97780
21.5%
- 90904
20.0%
1 84168
18.5%
2 52924
11.6%
9 39824
8.8%
3 15474
 
3.4%
8 15303
 
3.4%
6 15047
 
3.3%
5 14857
 
3.3%
7 14310
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 363616
80.0%
Dash Punctuation 90904
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 97780
26.9%
1 84168
23.1%
2 52924
14.6%
9 39824
11.0%
3 15474
 
4.3%
8 15303
 
4.2%
6 15047
 
4.1%
5 14857
 
4.1%
7 14310
 
3.9%
4 13929
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 90904
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 454520
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 97780
21.5%
- 90904
20.0%
1 84168
18.5%
2 52924
11.6%
9 39824
8.8%
3 15474
 
3.4%
8 15303
 
3.4%
6 15047
 
3.3%
5 14857
 
3.3%
7 14310
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 454520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 97780
21.5%
- 90904
20.0%
1 84168
18.5%
2 52924
11.6%
9 39824
8.8%
3 15474
 
3.4%
8 15303
 
3.4%
6 15047
 
3.3%
5 14857
 
3.3%
7 14310
 
3.1%

revenue
Real number (ℝ)

Distinct6863
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11196880
Minimum0
Maximum2.7879651 × 109
Zeros38114
Zeros (%)83.7%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:18.580083image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile47734422
Maximum2.7879651 × 109
Range2.7879651 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation64277481
Coefficient of variation (CV)5.7406601
Kurtosis237.90628
Mean11196880
Median Absolute Deviation (MAD)0
Skewness12.27583
Sum5.099283 × 1011
Variance4.1315946 × 1015
MonotonicityNot monotonic
2023-06-10T19:56:19.026818image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 38114
83.7%
12000000 20
 
< 0.1%
10000000 19
 
< 0.1%
11000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
8000000 13
 
< 0.1%
500000 13
 
< 0.1%
14000000 12
 
< 0.1%
Other values (6853) 7283
 
16.0%
ValueCountFrequency (%)
0 38114
83.7%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
5 5
 
< 0.1%
6 2
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
2787965087 1
< 0.1%
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%
1506249360 1
< 0.1%
1405403694 1
< 0.1%
1342000000 1
< 0.1%
1274219009 1
< 0.1%
1262886337 1
< 0.1%

runtime
Real number (ℝ)

Distinct353
Distinct (%)0.8%
Missing263
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean94.126438
Minimum0
Maximum1256
Zeros1559
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:19.527372image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11
Q185
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)22

Descriptive statistics

Standard deviation38.398308
Coefficient of variation (CV)0.40794392
Kurtosis93.155665
Mean94.126438
Median Absolute Deviation (MAD)11
Skewness4.4608866
Sum4261951
Variance1474.4301
MonotonicityNot monotonic
2023-06-10T19:56:20.059644image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2559
 
5.6%
0 1559
 
3.4%
100 1471
 
3.2%
95 1414
 
3.1%
93 1219
 
2.7%
96 1104
 
2.4%
92 1082
 
2.4%
94 1064
 
2.3%
91 1058
 
2.3%
88 1032
 
2.3%
Other values (343) 31717
69.6%
ValueCountFrequency (%)
0 1559
3.4%
1 107
 
0.2%
2 34
 
0.1%
3 49
 
0.1%
4 51
 
0.1%
5 51
 
0.1%
6 72
 
0.2%
7 103
 
0.2%
8 78
 
0.2%
9 63
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 2
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
877 1
< 0.1%
874 1
< 0.1%
840 2
< 0.1%
780 1
< 0.1%
720 1
< 0.1%

spoken_languages
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct1843
Distinct (%)4.0%
Missing6
Missing (%)< 0.1%
Memory size355.9 KiB
['English']
22425 
[]
3836 
['Français']
 
1859
['日本語']
 
1293
['Italiano']
 
1218
Other values (1838)
14905 

Length

Max length215
Median length11
Mean length12.926366
Min length2

Characters and Unicode

Total characters588615
Distinct characters176
Distinct categories10 ?
Distinct scripts15 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1293 ?
Unique (%)2.8%

Sample

1st row['English']
2nd row['English', 'Français']
3rd row['English']
4th row['English']
5th row['English']

Common Values

ValueCountFrequency (%)
['English'] 22425
49.2%
[] 3836
 
8.4%
['Français'] 1859
 
4.1%
['日本語'] 1293
 
2.8%
['Italiano'] 1218
 
2.7%
['Español'] 902
 
2.0%
['Pусский'] 807
 
1.8%
['Deutsch'] 764
 
1.7%
['English', 'Français'] 682
 
1.5%
['English', 'Español'] 572
 
1.3%
Other values (1833) 11178
24.5%

Length

2023-06-10T19:56:20.674590image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english 28787
49.1%
4816
 
8.2%
français 4206
 
7.2%
deutsch 2628
 
4.5%
español 2413
 
4.1%
italiano 2369
 
4.0%
日本語 1762
 
3.0%
pусский 1563
 
2.7%
普通话 790
 
1.3%
हिन्दी 709
 
1.2%
Other values (69) 8593
 
14.7%

Most occurring characters

ValueCountFrequency (%)
' 106772
18.1%
[ 45536
 
7.7%
] 45536
 
7.7%
s 42367
 
7.2%
n 37543
 
6.4%
i 37190
 
6.3%
l 34695
 
5.9%
h 31521
 
5.4%
E 31257
 
5.3%
g 30474
 
5.2%
Other values (166) 145724
24.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 292685
49.7%
Other Punctuation 119575
20.3%
Uppercase Letter 46519
 
7.9%
Open Punctuation 45536
 
7.7%
Close Punctuation 45536
 
7.7%
Other Letter 22245
 
3.8%
Space Separator 13100
 
2.2%
Spacing Mark 1842
 
0.3%
Nonspacing Mark 1551
 
0.3%
Decimal Number 26
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 42367
14.5%
n 37543
12.8%
i 37190
12.7%
l 34695
11.9%
h 31521
10.8%
g 30474
10.4%
a 19015
6.5%
o 7067
 
2.4%
r 6144
 
2.1%
t 5985
 
2.0%
Other values (64) 40684
13.9%
Other Letter
ValueCountFrequency (%)
1762
 
7.9%
1762
 
7.9%
1762
 
7.9%
1263
 
5.7%
946
 
4.3%
790
 
3.6%
790
 
3.6%
709
 
3.2%
709
 
3.2%
709
 
3.2%
Other values (46) 11043
49.6%
Uppercase Letter
ValueCountFrequency (%)
E 31257
67.2%
F 4208
 
9.0%
D 2932
 
6.3%
P 2679
 
5.8%
I 2369
 
5.1%
N 833
 
1.8%
L 507
 
1.1%
M 363
 
0.8%
T 308
 
0.7%
Č 286
 
0.6%
Other values (13) 777
 
1.7%
Spacing Mark
ValueCountFrequency (%)
709
38.5%
ि 709
38.5%
136
 
7.4%
ி 111
 
6.0%
94
 
5.1%
47
 
2.6%
18
 
1.0%
18
 
1.0%
Nonspacing Mark
ValueCountFrequency (%)
709
45.7%
ִ 430
27.7%
ְ 215
 
13.9%
111
 
7.2%
68
 
4.4%
18
 
1.2%
Other Punctuation
ValueCountFrequency (%)
' 106772
89.3%
, 11686
 
9.8%
/ 1015
 
0.8%
\ 52
 
< 0.1%
? 50
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
[ 45536
100.0%
Close Punctuation
ValueCountFrequency (%)
] 45536
100.0%
Space Separator
ValueCountFrequency (%)
13100
100.0%
Decimal Number
ValueCountFrequency (%)
9 26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 326809
55.5%
Common 223773
38.0%
Han 10494
 
1.8%
Cyrillic 10460
 
1.8%
Devanagari 4254
 
0.7%
Arabic 3366
 
0.6%
Hangul 3252
 
0.6%
Hebrew 1720
 
0.3%
Greek 1704
 
0.3%
Thai 1246
 
0.2%
Other values (5) 1537
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 42367
13.0%
n 37543
11.5%
i 37190
11.4%
l 34695
10.6%
h 31521
9.6%
E 31257
9.6%
g 30474
9.3%
a 19015
 
5.8%
o 7067
 
2.2%
r 6144
 
1.9%
Other values (51) 49536
15.2%
Cyrillic
ValueCountFrequency (%)
с 3213
30.7%
к 1735
16.6%
и 1680
16.1%
й 1616
15.4%
у 1565
15.0%
а 113
 
1.1%
р 87
 
0.8%
н 53
 
0.5%
ь 53
 
0.5%
У 53
 
0.5%
Other values (12) 292
 
2.8%
Arabic
ValueCountFrequency (%)
ا 541
16.1%
ر 541
16.1%
ب 342
10.2%
ة 342
10.2%
ي 342
10.2%
ع 342
10.2%
ل 342
10.2%
ی 144
 
4.3%
ف 144
 
4.3%
س 144
 
4.3%
Other values (5) 142
 
4.2%
Han
ValueCountFrequency (%)
1762
16.8%
1762
16.8%
1762
16.8%
1263
12.0%
946
9.0%
790
7.5%
790
7.5%
广 473
 
4.5%
473
 
4.5%
473
 
4.5%
Common
ValueCountFrequency (%)
' 106772
47.7%
[ 45536
20.3%
] 45536
20.3%
13100
 
5.9%
, 11686
 
5.2%
/ 1015
 
0.5%
\ 52
 
< 0.1%
? 50
 
< 0.1%
9 26
 
< 0.1%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
י 215
12.5%
ע 215
12.5%
ב 215
12.5%
ְ 215
12.5%
ר 215
12.5%
ת 215
12.5%
Greek
ValueCountFrequency (%)
λ 426
25.0%
κ 213
12.5%
ε 213
12.5%
η 213
12.5%
ν 213
12.5%
ι 213
12.5%
ά 213
12.5%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Devanagari
ValueCountFrequency (%)
709
16.7%
709
16.7%
709
16.7%
ि 709
16.7%
709
16.7%
709
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Thai
ValueCountFrequency (%)
356
28.6%
178
14.3%
178
14.3%
178
14.3%
178
14.3%
178
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 541734
92.0%
CJK 10494
 
1.8%
Cyrillic 10460
 
1.8%
None 10426
 
1.8%
Devanagari 4254
 
0.7%
Arabic 3366
 
0.6%
Hangul 3252
 
0.6%
Hebrew 1720
 
0.3%
Thai 1246
 
0.2%
Tamil 555
 
0.1%
Other values (6) 1108
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 106772
19.7%
[ 45536
8.4%
] 45536
8.4%
s 42367
 
7.8%
n 37543
 
6.9%
i 37190
 
6.9%
l 34695
 
6.4%
h 31521
 
5.8%
E 31257
 
5.8%
g 30474
 
5.6%
Other values (44) 98843
18.2%
None
ValueCountFrequency (%)
ç 4453
42.7%
ñ 2413
23.1%
ê 591
 
5.7%
λ 426
 
4.1%
Č 286
 
2.7%
ý 286
 
2.7%
ü 247
 
2.4%
κ 213
 
2.0%
ε 213
 
2.0%
η 213
 
2.0%
Other values (10) 1085
 
10.4%
Cyrillic
ValueCountFrequency (%)
с 3213
30.7%
к 1735
16.6%
и 1680
16.1%
й 1616
15.4%
у 1565
15.0%
а 113
 
1.1%
р 87
 
0.8%
н 53
 
0.5%
ь 53
 
0.5%
У 53
 
0.5%
Other values (12) 292
 
2.8%
CJK
ValueCountFrequency (%)
1762
16.8%
1762
16.8%
1762
16.8%
1263
12.0%
946
9.0%
790
7.5%
790
7.5%
广 473
 
4.5%
473
 
4.5%
473
 
4.5%
Devanagari
ValueCountFrequency (%)
709
16.7%
709
16.7%
709
16.7%
ि 709
16.7%
709
16.7%
709
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Arabic
ValueCountFrequency (%)
ا 541
16.1%
ر 541
16.1%
ب 342
10.2%
ة 342
10.2%
ي 342
10.2%
ع 342
10.2%
ل 342
10.2%
ی 144
 
4.3%
ف 144
 
4.3%
س 144
 
4.3%
Other values (5) 142
 
4.2%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
י 215
12.5%
ע 215
12.5%
ב 215
12.5%
ְ 215
12.5%
ר 215
12.5%
ת 215
12.5%
Thai
ValueCountFrequency (%)
356
28.6%
178
14.3%
178
14.3%
178
14.3%
178
14.3%
178
14.3%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%
Latin Ext Additional
ValueCountFrequency (%)
ế 61
50.0%
61
50.0%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
IPA Ext
ValueCountFrequency (%)
ə 4
100.0%

status
Categorical

Distinct6
Distinct (%)< 0.1%
Missing87
Missing (%)0.2%
Memory size355.9 KiB
Released
45088 
Rumored
 
232
Post Production
 
98
In Production
 
20
Planned
 
15

Length

Max length15
Median length8
Mean length8.0118579
Min length7

Characters and Unicode

Total characters364179
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased

Common Values

ValueCountFrequency (%)
Released 45088
99.0%
Rumored 232
 
0.5%
Post Production 98
 
0.2%
In Production 20
 
< 0.1%
Planned 15
 
< 0.1%
Canceled 2
 
< 0.1%
(Missing) 87
 
0.2%

Length

2023-06-10T19:56:21.209120image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-10T19:56:21.701997image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
released 45088
98.9%
rumored 232
 
0.5%
production 118
 
0.3%
post 98
 
0.2%
in 20
 
< 0.1%
planned 15
 
< 0.1%
canceled 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 135515
37.2%
d 45455
 
12.5%
R 45320
 
12.4%
s 45186
 
12.4%
l 45105
 
12.4%
a 45105
 
12.4%
o 566
 
0.2%
r 350
 
0.1%
u 350
 
0.1%
m 232
 
0.1%
Other values (8) 995
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 318488
87.5%
Uppercase Letter 45573
 
12.5%
Space Separator 118
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 135515
42.5%
d 45455
 
14.3%
s 45186
 
14.2%
l 45105
 
14.2%
a 45105
 
14.2%
o 566
 
0.2%
r 350
 
0.1%
u 350
 
0.1%
m 232
 
0.1%
t 216
 
0.1%
Other values (3) 408
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 45320
99.4%
P 231
 
0.5%
I 20
 
< 0.1%
C 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
118
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 364061
> 99.9%
Common 118
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 135515
37.2%
d 45455
 
12.5%
R 45320
 
12.4%
s 45186
 
12.4%
l 45105
 
12.4%
a 45105
 
12.4%
o 566
 
0.2%
r 350
 
0.1%
u 350
 
0.1%
m 232
 
0.1%
Other values (7) 877
 
0.2%
Common
ValueCountFrequency (%)
118
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 364179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 135515
37.2%
d 45455
 
12.5%
R 45320
 
12.4%
s 45186
 
12.4%
l 45105
 
12.4%
a 45105
 
12.4%
o 566
 
0.2%
r 350
 
0.1%
u 350
 
0.1%
m 232
 
0.1%
Other values (8) 995
 
0.3%

tagline
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct20283
Distinct (%)99.2%
Missing25103
Missing (%)55.1%
Memory size355.9 KiB
Which one is the first to return - memory or the murderer?
 
9
Based on a true story.
 
7
Pokémon: Spell of the Unknown
 
4
There is no solitude greater than that of the Samurai
 
4
A love, a hope, a wall.
 
4
Other values (20278)
20411 

Length

Max length297
Median length204
Mean length47.006752
Min length1

Characters and Unicode

Total characters960771
Distinct characters170
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20174 ?
Unique (%)98.7%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga

Common Values

ValueCountFrequency (%)
Which one is the first to return - memory or the murderer? 9
 
< 0.1%
Based on a true story. 7
 
< 0.1%
Pokémon: Spell of the Unknown 4
 
< 0.1%
There is no solitude greater than that of the Samurai 4
 
< 0.1%
A love, a hope, a wall. 4
 
< 0.1%
Trust no one. 4
 
< 0.1%
Every woman who has loved will understand 4
 
< 0.1%
Some things are better left top secret. 4
 
< 0.1%
From the very beginning, they knew they'd be friends to the end. What they didn't count on was everything in between. 4
 
< 0.1%
- 4
 
< 0.1%
Other values (20273) 20391
44.8%
(Missing) 25103
55.1%

Length

2023-06-10T19:56:22.223093image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 11031
 
6.3%
a 6831
 
3.9%
of 4412
 
2.5%
to 3594
 
2.1%
is 2808
 
1.6%
in 2698
 
1.5%
and 2688
 
1.5%
you 2392
 
1.4%
1591
 
0.9%
for 1525
 
0.9%
Other values (15108) 134744
77.3%

Most occurring characters

ValueCountFrequency (%)
154023
16.0%
e 94648
 
9.9%
t 57409
 
6.0%
o 56689
 
5.9%
a 51572
 
5.4%
n 47624
 
5.0%
i 46141
 
4.8%
r 45119
 
4.7%
s 42435
 
4.4%
h 37258
 
3.9%
Other values (160) 327853
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 682053
71.0%
Space Separator 154023
 
16.0%
Uppercase Letter 75091
 
7.8%
Other Punctuation 44643
 
4.6%
Decimal Number 2687
 
0.3%
Dash Punctuation 1954
 
0.2%
Final Punctuation 98
 
< 0.1%
Open Punctuation 56
 
< 0.1%
Close Punctuation 55
 
< 0.1%
Currency Symbol 37
 
< 0.1%
Other values (7) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 94648
13.9%
t 57409
 
8.4%
o 56689
 
8.3%
a 51572
 
7.6%
n 47624
 
7.0%
i 46141
 
6.8%
r 45119
 
6.6%
s 42435
 
6.2%
h 37258
 
5.5%
l 30231
 
4.4%
Other values (43) 172927
25.4%
Other Letter
ValueCountFrequency (%)
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
Other values (24) 24
70.6%
Uppercase Letter
ValueCountFrequency (%)
T 10017
 
13.3%
A 6885
 
9.2%
S 5663
 
7.5%
H 4407
 
5.9%
I 4388
 
5.8%
E 4311
 
5.7%
W 3691
 
4.9%
O 3482
 
4.6%
N 3201
 
4.3%
L 3198
 
4.3%
Other values (20) 25848
34.4%
Other Punctuation
ValueCountFrequency (%)
. 26674
59.7%
! 5785
 
13.0%
' 5680
 
12.7%
, 4239
 
9.5%
? 1167
 
2.6%
" 582
 
1.3%
148
 
0.3%
: 140
 
0.3%
& 84
 
0.2%
* 42
 
0.1%
Other values (7) 102
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 802
29.8%
1 516
19.2%
2 299
 
11.1%
9 208
 
7.7%
3 208
 
7.7%
5 168
 
6.3%
4 140
 
5.2%
7 121
 
4.5%
6 121
 
4.5%
8 104
 
3.9%
Math Symbol
ValueCountFrequency (%)
+ 5
35.7%
= 5
35.7%
| 2
 
14.3%
~ 1
 
7.1%
1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 1937
99.1%
9
 
0.5%
8
 
0.4%
Final Punctuation
ValueCountFrequency (%)
82
83.7%
15
 
15.3%
» 1
 
1.0%
Initial Punctuation
ValueCountFrequency (%)
14
73.7%
4
 
21.1%
« 1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 49
87.5%
[ 7
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 48
87.3%
] 7
 
12.7%
Other Number
ValueCountFrequency (%)
½ 2
66.7%
² 1
33.3%
Modifier Letter
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Space Separator
ValueCountFrequency (%)
154023
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 37
100.0%
Nonspacing Mark
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 757144
78.8%
Common 203592
 
21.2%
Han 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 94648
 
12.5%
t 57409
 
7.6%
o 56689
 
7.5%
a 51572
 
6.8%
n 47624
 
6.3%
i 46141
 
6.1%
r 45119
 
6.0%
s 42435
 
5.6%
h 37258
 
4.9%
l 30231
 
4.0%
Other values (73) 248018
32.8%
Common
ValueCountFrequency (%)
154023
75.7%
. 26674
 
13.1%
! 5785
 
2.8%
' 5680
 
2.8%
, 4239
 
2.1%
- 1937
 
1.0%
? 1167
 
0.6%
0 802
 
0.4%
" 582
 
0.3%
1 516
 
0.3%
Other values (42) 2187
 
1.1%
Han
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 960339
> 99.9%
Punctuation 280
 
< 0.1%
None 112
 
< 0.1%
CJK 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%
IPA Ext 2
 
< 0.1%
Modifier Letters 2
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
154023
16.0%
e 94648
 
9.9%
t 57409
 
6.0%
o 56689
 
5.9%
a 51572
 
5.4%
n 47624
 
5.0%
i 46141
 
4.8%
r 45119
 
4.7%
s 42435
 
4.4%
h 37258
 
3.9%
Other values (78) 327421
34.1%
Punctuation
ValueCountFrequency (%)
148
52.9%
82
29.3%
15
 
5.4%
14
 
5.0%
9
 
3.2%
8
 
2.9%
4
 
1.4%
None
ValueCountFrequency (%)
é 20
17.9%
ä 16
14.3%
ö 8
 
7.1%
ó 6
 
5.4%
á 6
 
5.4%
ü 5
 
4.5%
ı 5
 
4.5%
í 5
 
4.5%
· 4
 
3.6%
ć 3
 
2.7%
Other values (26) 34
30.4%
IPA Ext
ValueCountFrequency (%)
ə 2
100.0%
CJK
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Modifier Letters
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

title
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct42277
Distinct (%)92.8%
Missing6
Missing (%)< 0.1%
Memory size355.9 KiB
Blackout
 
13
Cinderella
 
11
Alice in Wonderland
 
9
Hamlet
 
9
Beauty and the Beast
 
8
Other values (42272)
45486 

Length

Max length105
Median length79
Mean length16.707265
Min length1

Characters and Unicode

Total characters760782
Distinct characters287
Distinct categories17 ?
Distinct scripts7 ?
Distinct blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39935 ?
Unique (%)87.7%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II

Common Values

ValueCountFrequency (%)
Blackout 13
 
< 0.1%
Cinderella 11
 
< 0.1%
Alice in Wonderland 9
 
< 0.1%
Hamlet 9
 
< 0.1%
Beauty and the Beast 8
 
< 0.1%
King Lear 8
 
< 0.1%
Les Misérables 8
 
< 0.1%
The Promise 8
 
< 0.1%
The Three Musketeers 7
 
< 0.1%
A Christmas Carol 7
 
< 0.1%
Other values (42267) 45448
99.8%

Length

2023-06-10T19:56:22.824294image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 14593
 
10.7%
of 4952
 
3.6%
a 2251
 
1.6%
in 1697
 
1.2%
and 1640
 
1.2%
to 1057
 
0.8%
766
 
0.6%
man 665
 
0.5%
love 664
 
0.5%
for 602
 
0.4%
Other values (24431) 107791
78.9%

Most occurring characters

ValueCountFrequency (%)
91164
 
12.0%
e 76538
 
10.1%
a 49135
 
6.5%
o 45856
 
6.0%
n 40986
 
5.4%
r 40160
 
5.3%
i 39898
 
5.2%
t 36835
 
4.8%
s 29641
 
3.9%
h 28607
 
3.8%
Other values (277) 281962
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 536254
70.5%
Uppercase Letter 117663
 
15.5%
Space Separator 91164
 
12.0%
Other Punctuation 10524
 
1.4%
Decimal Number 3873
 
0.5%
Dash Punctuation 990
 
0.1%
Close Punctuation 87
 
< 0.1%
Open Punctuation 85
 
< 0.1%
Final Punctuation 38
 
< 0.1%
Other Letter 25
 
< 0.1%
Other values (7) 79
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76538
14.3%
a 49135
9.2%
o 45856
 
8.6%
n 40986
 
7.6%
r 40160
 
7.5%
i 39898
 
7.4%
t 36835
 
6.9%
s 29641
 
5.5%
h 28607
 
5.3%
l 26042
 
4.9%
Other values (121) 122556
22.9%
Uppercase Letter
ValueCountFrequency (%)
T 16055
13.6%
S 10365
 
8.8%
M 8046
 
6.8%
B 7691
 
6.5%
C 7194
 
6.1%
A 6815
 
5.8%
D 6368
 
5.4%
L 5890
 
5.0%
H 5183
 
4.4%
W 5183
 
4.4%
Other values (65) 38873
33.0%
Other Letter
ValueCountFrequency (%)
ی 2
 
8.0%
چ 2
 
8.0%
ه 2
 
8.0%
ک 2
 
8.0%
ª 1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
ا 1
 
4.0%
Other values (11) 11
44.0%
Other Punctuation
ValueCountFrequency (%)
: 3735
35.5%
' 2512
23.9%
. 1604
15.2%
, 1139
 
10.8%
! 648
 
6.2%
& 460
 
4.4%
? 269
 
2.6%
/ 80
 
0.8%
* 19
 
0.2%
# 13
 
0.1%
Other values (8) 45
 
0.4%
Decimal Number
ValueCountFrequency (%)
2 864
22.3%
1 703
18.2%
0 619
16.0%
3 484
12.5%
9 232
 
6.0%
4 231
 
6.0%
5 227
 
5.9%
7 196
 
5.1%
8 161
 
4.2%
6 156
 
4.0%
Math Symbol
ValueCountFrequency (%)
+ 17
70.8%
× 3
 
12.5%
= 1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other Number
ValueCountFrequency (%)
½ 12
63.2%
² 3
 
15.8%
³ 2
 
10.5%
1
 
5.3%
1
 
5.3%
Other Symbol
ValueCountFrequency (%)
° 3
37.5%
2
25.0%
1
 
12.5%
1
 
12.5%
1
 
12.5%
Currency Symbol
ValueCountFrequency (%)
$ 18
85.7%
¢ 2
 
9.5%
£ 1
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 975
98.5%
15
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 82
94.3%
] 5
 
5.7%
Open Punctuation
ValueCountFrequency (%)
( 80
94.1%
[ 5
 
5.9%
Final Punctuation
ValueCountFrequency (%)
37
97.4%
1
 
2.6%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
91164
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 653387
85.9%
Common 106840
 
14.0%
Cyrillic 361
 
< 0.1%
Greek 170
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
Han 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 76538
 
11.7%
a 49135
 
7.5%
o 45856
 
7.0%
n 40986
 
6.3%
r 40160
 
6.1%
i 39898
 
6.1%
t 36835
 
5.6%
s 29641
 
4.5%
h 28607
 
4.4%
l 26042
 
4.0%
Other values (107) 239689
36.7%
Common
ValueCountFrequency (%)
91164
85.3%
: 3735
 
3.5%
' 2512
 
2.4%
. 1604
 
1.5%
, 1139
 
1.1%
- 975
 
0.9%
2 864
 
0.8%
1 703
 
0.7%
! 648
 
0.6%
0 619
 
0.6%
Other values (50) 2877
 
2.7%
Cyrillic
ValueCountFrequency (%)
е 33
 
9.1%
о 32
 
8.9%
а 32
 
8.9%
н 26
 
7.2%
и 24
 
6.6%
р 23
 
6.4%
к 17
 
4.7%
в 16
 
4.4%
с 15
 
4.2%
т 14
 
3.9%
Other values (38) 129
35.7%
Greek
ValueCountFrequency (%)
α 20
 
11.8%
ι 14
 
8.2%
ο 14
 
8.2%
τ 9
 
5.3%
λ 8
 
4.7%
ά 8
 
4.7%
ρ 8
 
4.7%
ν 7
 
4.1%
ε 6
 
3.5%
π 6
 
3.5%
Other values (32) 70
41.2%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
ی 2
18.2%
چ 2
18.2%
ه 2
18.2%
ک 2
18.2%
ا 1
9.1%
س 1
9.1%
ج 1
9.1%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 759185
99.8%
None 1141
 
0.1%
Cyrillic 361
 
< 0.1%
Punctuation 62
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
CJK 5
 
< 0.1%
Misc Symbols 3
 
< 0.1%
Letterlike Symbols 2
 
< 0.1%
Math Operators 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
91164
 
12.0%
e 76538
 
10.1%
a 49135
 
6.5%
o 45856
 
6.0%
n 40986
 
5.4%
r 40160
 
5.3%
i 39898
 
5.3%
t 36835
 
4.9%
s 29641
 
3.9%
h 28607
 
3.8%
Other values (76) 280365
36.9%
None
ValueCountFrequency (%)
é 222
19.5%
ä 129
 
11.3%
ö 58
 
5.1%
è 54
 
4.7%
ô 44
 
3.9%
ü 39
 
3.4%
ó 37
 
3.2%
á 35
 
3.1%
ı 35
 
3.1%
à 33
 
2.9%
Other values (108) 455
39.9%
Punctuation
ValueCountFrequency (%)
37
59.7%
15
24.2%
5
 
8.1%
2
 
3.2%
1
 
1.6%
1
 
1.6%
1
 
1.6%
Cyrillic
ValueCountFrequency (%)
е 33
 
9.1%
о 32
 
8.9%
а 32
 
8.9%
н 26
 
7.2%
и 24
 
6.6%
р 23
 
6.4%
к 17
 
4.7%
в 16
 
4.4%
с 15
 
4.2%
т 14
 
3.9%
Other values (38) 129
35.7%
Arabic
ValueCountFrequency (%)
ی 2
18.2%
چ 2
18.2%
ه 2
18.2%
ک 2
18.2%
ا 1
9.1%
س 1
9.1%
ج 1
9.1%
Misc Symbols
ValueCountFrequency (%)
2
66.7%
1
33.3%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Letterlike Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Math Operators
ValueCountFrequency (%)
1
50.0%
1
50.0%
Arrows
ValueCountFrequency (%)
1
100.0%

vote_average
Real number (ℝ)

Distinct92
Distinct (%)0.2%
Missing6
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5.6181087
Minimum0
Maximum10
Zeros3005
Zeros (%)6.6%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:23.367246image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.9243622
Coefficient of variation (CV)0.34252848
Kurtosis2.5009355
Mean5.6181087
Median Absolute Deviation (MAD)0.9
Skewness-1.5193805
Sum255826.2
Variance3.70317
MonotonicityNot monotonic
2023-06-10T19:56:23.861043image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3005
 
6.6%
6 2471
 
5.4%
5 2009
 
4.4%
7 1888
 
4.1%
6.5 1722
 
3.8%
6.3 1605
 
3.5%
5.5 1383
 
3.0%
5.8 1370
 
3.0%
6.4 1354
 
3.0%
6.7 1351
 
3.0%
Other values (82) 27378
60.1%
ValueCountFrequency (%)
0 3005
6.6%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 105
 
0.2%
1.1 1
 
< 0.1%
1.2 4
 
< 0.1%
1.3 13
 
< 0.1%
1.4 5
 
< 0.1%
1.5 30
 
0.1%
1.6 6
 
< 0.1%
ValueCountFrequency (%)
10 190
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 18
 
< 0.1%
9.4 3
 
< 0.1%
9.3 18
 
< 0.1%
9.2 4
 
< 0.1%
9.1 3
 
< 0.1%
9 160
0.4%
8.9 7
 
< 0.1%

vote_count
Real number (ℝ)

Distinct1820
Distinct (%)4.0%
Missing6
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean109.78872
Minimum0
Maximum14075
Zeros2906
Zeros (%)6.4%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:24.357555image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median10
Q334
95-th percentile433
Maximum14075
Range14075
Interquartile range (IQR)31

Descriptive statistics

Standard deviation490.91574
Coefficient of variation (CV)4.471459
Kurtosis151.45026
Mean109.78872
Median Absolute Deviation (MAD)8
Skewness10.458626
Sum4999339
Variance240998.27
MonotonicityNot monotonic
2023-06-10T19:56:25.227918image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3268
 
7.2%
2 3133
 
6.9%
0 2906
 
6.4%
3 2799
 
6.1%
4 2482
 
5.4%
5 2099
 
4.6%
6 1747
 
3.8%
7 1574
 
3.5%
8 1360
 
3.0%
9 1195
 
2.6%
Other values (1810) 22973
50.4%
ValueCountFrequency (%)
0 2906
6.4%
1 3268
7.2%
2 3133
6.9%
3 2799
6.1%
4 2482
5.4%
5 2099
4.6%
6 1747
3.8%
7 1574
3.5%
8 1360
3.0%
9 1195
 
2.6%
ValueCountFrequency (%)
14075 1
< 0.1%
12269 1
< 0.1%
12114 1
< 0.1%
12000 1
< 0.1%
11444 1
< 0.1%
11187 1
< 0.1%
10297 1
< 0.1%
10014 1
< 0.1%
9678 1
< 0.1%
9634 1
< 0.1%

release_year
Real number (ℝ)

Distinct135
Distinct (%)0.3%
Missing90
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1991.8826
Minimum1874
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:25.742884image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1874
5-th percentile1941
Q11978
median2001
Q32010
95-th percentile2015
Maximum2020
Range146
Interquartile range (IQR)32

Descriptive statistics

Standard deviation24.05775
Coefficient of variation (CV)0.012077896
Kurtosis0.84069867
Mean1991.8826
Median Absolute Deviation (MAD)12
Skewness-1.2253957
Sum90535047
Variance578.77535
MonotonicityNot monotonic
2023-06-10T19:56:26.206981image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014 1976
 
4.3%
2015 1907
 
4.2%
2013 1895
 
4.2%
2012 1727
 
3.8%
2011 1669
 
3.7%
2016 1604
 
3.5%
2009 1591
 
3.5%
2010 1501
 
3.3%
2008 1482
 
3.3%
2007 1322
 
2.9%
Other values (125) 28778
63.2%
ValueCountFrequency (%)
1874 1
 
< 0.1%
1878 1
 
< 0.1%
1883 1
 
< 0.1%
1887 1
 
< 0.1%
1888 2
 
< 0.1%
1890 5
 
< 0.1%
1891 6
< 0.1%
1892 3
 
< 0.1%
1893 1
 
< 0.1%
1894 13
< 0.1%
ValueCountFrequency (%)
2020 1
 
< 0.1%
2018 5
 
< 0.1%
2017 532
 
1.2%
2016 1604
3.5%
2015 1907
4.2%
2014 1976
4.3%
2013 1895
4.2%
2012 1727
3.8%
2011 1669
3.7%
2010 1501
3.3%

return
Real number (ℝ)

INFINITE  MISSING  ZEROS 

Distinct5233
Distinct (%)47.8%
Missing34592
Missing (%)76.0%
Infinite2035
Infinite (%)4.5%
Meaninf
Minimum0
Maximuminf
Zeros3522
Zeros (%)7.7%
Negative0
Negative (%)0.0%
Memory size355.9 KiB
2023-06-10T19:56:26.734547image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1.2728531
Q37.2645927
95-th percentilenan
Maximuminf
Rangeinf
Interquartile range (IQR)7.2645927

Descriptive statistics

Standard deviationnan
Coefficient of variation (CV)nan
Kurtosisnan
Meaninf
Median Absolute Deviation (MAD)1.2728531
Skewnessnan
Suminf
Variancenan
MonotonicityNot monotonic
2023-06-10T19:56:27.266574image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3522
 
7.7%
inf 2035
 
4.5%
1 20
 
< 0.1%
2 12
 
< 0.1%
4 11
 
< 0.1%
5 8
 
< 0.1%
3 7
 
< 0.1%
1.333333333 7
 
< 0.1%
2.5 7
 
< 0.1%
1.5 6
 
< 0.1%
Other values (5223) 5315
 
11.7%
(Missing) 34592
76.0%
ValueCountFrequency (%)
0 3522
7.7%
5.217391304 × 10-71
 
< 0.1%
7.5 × 10-71
 
< 0.1%
9.375 × 10-71
 
< 0.1%
1.499133126 × 10-61
 
< 0.1%
1.8 × 10-61
 
< 0.1%
1.916666667 × 10-61
 
< 0.1%
3.5 × 10-61
 
< 0.1%
4 × 10-61
 
< 0.1%
5.111111111 × 10-61
 
< 0.1%
ValueCountFrequency (%)
inf 2035
4.5%
12396383 1
 
< 0.1%
8500000 1
 
< 0.1%
4197476.625 1
 
< 0.1%
2755584 1
 
< 0.1%
1018619.283 1
 
< 0.1%
1000000 1
 
< 0.1%
26881.72043 1
 
< 0.1%
12890.38667 1
 
< 0.1%
5330.33945 1
 
< 0.1%

cast
Categorical

Distinct42663
Distinct (%)93.7%
Missing4
Missing (%)< 0.1%
Memory size355.9 KiB
[]
 
2430
['Georges Méliès']
 
28
['Louis Theroux']
 
15
['Mel Blanc']
 
12
['Petteri Summanen', 'Ismo Kallio', 'Eppu Salminen', 'Irina Björklund', 'Hannu-Pekka Björkman', 'Jenni Banerjee', 'Mikko Leppilampi', 'Lena Meriläinen', 'Mari Perankoski', 'Risto Kaskilahti']
 
9
Other values (42658)
43044 

Length

Max length5099
Median length1498
Mean length211.89187
Min length2

Characters and Unicode

Total characters9649132
Distinct characters394
Distinct categories14 ?
Distinct scripts9 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42456 ?
Unique (%)93.2%

Sample

1st row['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Jim Varney', 'Wallace Shawn', 'John Ratzenberger', 'Annie Potts', 'John Morris', 'Erik von Detten', 'Laurie Metcalf', 'R. Lee Ermey', 'Sarah Freeman', 'Penn Jillette']
2nd row['Robin Williams', 'Jonathan Hyde', 'Kirsten Dunst', 'Bradley Pierce', 'Bonnie Hunt', 'Bebe Neuwirth', 'David Alan Grier', 'Patricia Clarkson', 'Adam Hann-Byrd', 'Laura Bell Bundy', 'James Handy', 'Gillian Barber', 'Brandon Obray', 'Cyrus Thiedeke', 'Gary Joseph Thorup', 'Leonard Zola', 'Lloyd Berry', 'Malcolm Stewart', 'Annabel Kershaw', 'Darryl Henriques', 'Robyn Driscoll', 'Peter Bryant', 'Sarah Gilson', 'Florica Vlad', 'June Lion', 'Brenda Lockmuller']
3rd row['Walter Matthau', 'Jack Lemmon', 'Ann-Margret', 'Sophia Loren', 'Daryl Hannah', 'Burgess Meredith', 'Kevin Pollak']
4th row['Whitney Houston', 'Angela Bassett', 'Loretta Devine', 'Lela Rochon', 'Gregory Hines', 'Dennis Haysbert', 'Michael Beach', 'Mykelti Williamson', 'Lamont Johnson', 'Wesley Snipes']
5th row['Steve Martin', 'Diane Keaton', 'Martin Short', 'Kimberly Williams-Paisley', 'George Newbern', 'Kieran Culkin', 'BD Wong', 'Peter Michael Goetz', 'Kate McGregor-Stewart', 'Jane Adams', 'Eugene Levy', 'Lori Alan']

Common Values

ValueCountFrequency (%)
[] 2430
 
5.3%
['Georges Méliès'] 28
 
0.1%
['Louis Theroux'] 15
 
< 0.1%
['Mel Blanc'] 12
 
< 0.1%
['Petteri Summanen', 'Ismo Kallio', 'Eppu Salminen', 'Irina Björklund', 'Hannu-Pekka Björkman', 'Jenni Banerjee', 'Mikko Leppilampi', 'Lena Meriläinen', 'Mari Perankoski', 'Risto Kaskilahti'] 9
 
< 0.1%
['Jimmy Carr'] 9
 
< 0.1%
['David Attenborough'] 8
 
< 0.1%
['Louis C.K.'] 8
 
< 0.1%
['George Carlin'] 8
 
< 0.1%
['Werner Herzog'] 8
 
< 0.1%
Other values (42653) 43003
94.4%

Length

2023-06-10T19:56:27.867560image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john 9723
 
0.8%
michael 7392
 
0.6%
david 6147
 
0.5%
james 5630
 
0.5%
robert 5628
 
0.5%
richard 4402
 
0.4%
paul 4309
 
0.4%
peter 3820
 
0.3%
george 3356
 
0.3%
william 3340
 
0.3%
Other values (112107) 1103904
95.4%

Most occurring characters

ValueCountFrequency (%)
' 1115724
 
11.6%
1112172
 
11.5%
a 700269
 
7.3%
e 659492
 
6.8%
n 519048
 
5.4%
, 515082
 
5.3%
r 493142
 
5.1%
i 480013
 
5.0%
o 419851
 
4.4%
l 362263
 
3.8%
Other values (384) 3272076
33.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5607519
58.1%
Other Punctuation 1646903
 
17.1%
Uppercase Letter 1176548
 
12.2%
Space Separator 1112172
 
11.5%
Open Punctuation 45561
 
0.5%
Close Punctuation 45547
 
0.5%
Dash Punctuation 14098
 
0.1%
Other Letter 543
 
< 0.1%
Decimal Number 113
 
< 0.1%
Final Punctuation 83
 
< 0.1%
Other values (4) 45
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 700269
12.5%
e 659492
11.8%
n 519048
9.3%
r 493142
 
8.8%
i 480013
 
8.6%
o 419851
 
7.5%
l 362263
 
6.5%
s 254773
 
4.5%
t 251970
 
4.5%
h 196689
 
3.5%
Other values (138) 1270009
22.6%
Other Letter
ValueCountFrequency (%)
ا 32
 
5.9%
م 31
 
5.7%
ی 19
 
3.5%
ع 19
 
3.5%
ن 18
 
3.3%
د 17
 
3.1%
ر 17
 
3.1%
17
 
3.1%
ي 16
 
2.9%
12
 
2.2%
Other values (104) 345
63.5%
Uppercase Letter
ValueCountFrequency (%)
M 108635
 
9.2%
S 91734
 
7.8%
C 83023
 
7.1%
J 82744
 
7.0%
B 81492
 
6.9%
A 69895
 
5.9%
R 66833
 
5.7%
D 64308
 
5.5%
L 60860
 
5.2%
G 54401
 
4.6%
Other values (81) 412623
35.1%
Other Punctuation
ValueCountFrequency (%)
' 1115724
67.7%
, 515082
31.3%
. 15881
 
1.0%
" 127
 
< 0.1%
\ 62
 
< 0.1%
· 9
 
< 0.1%
: 6
 
< 0.1%
& 6
 
< 0.1%
! 5
 
< 0.1%
/ 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 44
38.9%
5 37
32.7%
2 14
 
12.4%
1 7
 
6.2%
9 3
 
2.7%
4 2
 
1.8%
3 2
 
1.8%
7 2
 
1.8%
8 1
 
0.9%
6 1
 
0.9%
Nonspacing Mark
ValueCountFrequency (%)
́ 10
58.8%
2
 
11.8%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Open Punctuation
ValueCountFrequency (%)
[ 45538
99.9%
14
 
< 0.1%
( 9
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
74
89.2%
6
 
7.2%
» 3
 
3.6%
Close Punctuation
ValueCountFrequency (%)
] 45538
> 99.9%
) 9
 
< 0.1%
Initial Punctuation
ValueCountFrequency (%)
20
87.0%
« 3
 
13.0%
Space Separator
ValueCountFrequency (%)
1112172
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 14098
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 3
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6780983
70.3%
Common 2864505
29.7%
Cyrillic 3070
 
< 0.1%
Han 276
 
< 0.1%
Arabic 241
 
< 0.1%
Thai 27
 
< 0.1%
Greek 14
 
< 0.1%
Inherited 10
 
< 0.1%
Hangul 6
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 700269
 
10.3%
e 659492
 
9.7%
n 519048
 
7.7%
r 493142
 
7.3%
i 480013
 
7.1%
o 419851
 
6.2%
l 362263
 
5.3%
s 254773
 
3.8%
t 251970
 
3.7%
h 196689
 
2.9%
Other values (163) 2443473
36.0%
Han
ValueCountFrequency (%)
17
 
6.2%
12
 
4.3%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
9
 
3.3%
9
 
3.3%
Other values (55) 163
59.1%
Cyrillic
ValueCountFrequency (%)
а 323
 
10.5%
и 315
 
10.3%
о 233
 
7.6%
н 229
 
7.5%
р 215
 
7.0%
е 174
 
5.7%
л 155
 
5.0%
к 136
 
4.4%
т 115
 
3.7%
с 109
 
3.6%
Other values (51) 1066
34.7%
Common
ValueCountFrequency (%)
' 1115724
38.9%
1112172
38.8%
, 515082
18.0%
] 45538
 
1.6%
[ 45538
 
1.6%
. 15881
 
0.6%
- 14098
 
0.5%
" 127
 
< 0.1%
74
 
< 0.1%
\ 62
 
< 0.1%
Other values (24) 209
 
< 0.1%
Arabic
ValueCountFrequency (%)
ا 32
13.3%
م 31
12.9%
ی 19
 
7.9%
ع 19
 
7.9%
ن 18
 
7.5%
د 17
 
7.1%
ر 17
 
7.1%
ي 16
 
6.6%
ل 9
 
3.7%
ب 8
 
3.3%
Other values (18) 55
22.8%
Thai
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (11) 11
40.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Greek
ValueCountFrequency (%)
ν 6
42.9%
Ζ 2
 
14.3%
α 2
 
14.3%
ί 2
 
14.3%
ο 2
 
14.3%
Inherited
ValueCountFrequency (%)
́ 10
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9607234
99.6%
None 38098
 
0.4%
Cyrillic 3070
 
< 0.1%
CJK 276
 
< 0.1%
Arabic 241
 
< 0.1%
Punctuation 114
 
< 0.1%
Latin Ext Additional 56
 
< 0.1%
Thai 27
 
< 0.1%
Diacriticals 10
 
< 0.1%
Hangul 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 1115724
 
11.6%
1112172
 
11.6%
a 700269
 
7.3%
e 659492
 
6.9%
n 519048
 
5.4%
, 515082
 
5.4%
r 493142
 
5.1%
i 480013
 
5.0%
o 419851
 
4.4%
l 362263
 
3.8%
Other values (68) 3230178
33.6%
None
ValueCountFrequency (%)
é 9068
23.8%
á 4140
 
10.9%
í 2725
 
7.2%
ô 2294
 
6.0%
ö 2039
 
5.4%
ó 1873
 
4.9%
ü 1497
 
3.9%
ć 1296
 
3.4%
è 1243
 
3.3%
ä 1002
 
2.6%
Other values (110) 10921
28.7%
Cyrillic
ValueCountFrequency (%)
а 323
 
10.5%
и 315
 
10.3%
о 233
 
7.6%
н 229
 
7.5%
р 215
 
7.0%
е 174
 
5.7%
л 155
 
5.0%
к 136
 
4.4%
т 115
 
3.7%
с 109
 
3.6%
Other values (51) 1066
34.7%
Punctuation
ValueCountFrequency (%)
74
64.9%
20
 
17.5%
14
 
12.3%
6
 
5.3%
Arabic
ValueCountFrequency (%)
ا 32
13.3%
م 31
12.9%
ی 19
 
7.9%
ع 19
 
7.9%
ن 18
 
7.5%
د 17
 
7.1%
ر 17
 
7.1%
ي 16
 
6.6%
ل 9
 
3.7%
ب 8
 
3.3%
Other values (18) 55
22.8%
CJK
ValueCountFrequency (%)
17
 
6.2%
12
 
4.3%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
9
 
3.3%
9
 
3.3%
Other values (55) 163
59.1%
Latin Ext Additional
ValueCountFrequency (%)
15
26.8%
9
16.1%
6
 
10.7%
6
 
10.7%
ế 5
 
8.9%
4
 
7.1%
4
 
7.1%
4
 
7.1%
2
 
3.6%
1
 
1.8%
Diacriticals
ValueCountFrequency (%)
́ 10
100.0%
Thai
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (11) 11
40.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

crew
Categorical

Distinct42899
Distinct (%)94.2%
Missing4
Missing (%)< 0.1%
Memory size355.9 KiB
[]
 
805
['Georges Méliès']
 
36
['Christian I. Nyby II']
 
13
['Gerald Thomas', 'Talbot Rothwell']
 
13
['Frederick Wiseman']
 
12
Other values (42894)
44659 

Length

Max length7473
Median length2323
Mean length178.30596
Min length2

Characters and Unicode

Total characters8119697
Distinct characters359
Distinct categories14 ?
Distinct scripts8 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41752 ?
Unique (%)91.7%

Sample

1st row['John Lasseter', 'Joss Whedon', 'Andrew Stanton', 'Joel Cohen', 'Alec Sokolow', 'Bonnie Arnold', 'Ed Catmull', 'Ralph Guggenheim', 'Steve Jobs', 'Lee Unkrich', 'Ralph Eggleston', 'Robert Gordon', 'Mary Helen Leasman', 'Kim Blanchette', 'Marilyn McCoppen', 'Randy Newman', 'Dale E. Grahn', 'Robin Cooper', 'John Lasseter', 'Pete Docter', 'Joe Ranft', 'Patsy Bouge', 'Norm DeCarlo', 'Ash Brannon', 'Randy Newman', 'Roman Figun', 'Don Davis', 'James Flamberg', 'Mary Beth Smith', 'Rick Mackay', 'Susan Bradley', 'William Reeves', 'Randy Newman', 'Andrew Stanton', 'Pete Docter', 'Gary Rydstrom', 'Karen Robert Jackson', 'Chris Montan', 'Rich Quade', 'Michael Berenstein', 'Colin Brady', 'Davey Crockett Feiten', 'Angie Glocka', 'Rex Grignon', 'Tom K. Gurney', 'Jimmy Hayward', 'Hal T. Hickel', 'Karen Kiser', 'Anthony B. LaMolinara', 'Guionne Leroy', 'Bud Luckey', 'Les Major', 'Glenn McQueen', 'Mark Oftedal', 'Jeff Pidgeon', 'Jeff Pratt', 'Steve Rabatich', 'Roger Rose', 'Steve Segal', 'Doug Sheppeck', 'Alan Sperling', 'Doug Sweetland', 'David Tart', 'Ken Willard', 'Thomas Porter', 'Mark Thomas Henne', 'Oren Jacob', 'Darwyn Peachey', 'Mitch Prater', 'Brian M. Rosen', 'Sharon Calahan', 'Galyn Susman', 'William Cone', 'Shelley Daniels Lekven', 'Bob Pauley', 'Bud Luckey', 'Andrew Stanton', 'William Cone', 'Steve Johnson', 'Dan Haskett', 'Tom Holloway', 'Jean Gillmore', 'Desirée Mourad', 'Sonoko Konishi', 'Ann M. Rockwell', 'Julie M. McDonald', 'Robin Lee', 'Tom Freeman', 'Ada Cochavi', 'Dana Mulligan', 'Deirdre Morrison', 'Lori Lombardo', 'Ellen Devine', 'Lauren Beth Strogoff', 'Gary Rydstrom', 'Gary Summers', 'Tim Holland', 'Pat Jackson', 'Tom Myers', 'J.R. Grubbs', 'Susan Sanford', 'Susan Popovic', 'Dan Engstrom', 'Ruth Lambert', 'Mickie McGowan']
2nd row['Larry J. Franco', 'Jonathan Hensleigh', 'James Horner', 'Joe Johnston', 'Robert Dalva', 'Nancy Foy', 'Kyle Balda', 'James D. Bissell', 'Scott Kroopf', 'Ted Field', 'Robert W. Cort', 'Thomas E. Ackerman', 'Chris van Allsburg', 'William Teitler', 'Greg Taylor', 'Jim Strain']
3rd row['Howard Deutch', 'Mark Steven Johnson', 'Mark Steven Johnson', 'Jack Keller']
4th row['Forest Whitaker', 'Ronald Bass', 'Ronald Bass', 'Ezra Swerdlow', 'Deborah Schindler', 'Terry McMillan', 'Terry McMillan', 'Terry McMillan', 'Kenneth Edmonds', 'Caron K']
5th row['Alan Silvestri', 'Elliot Davis', 'Nancy Meyers', 'Nancy Meyers', 'Albert Hackett', 'Charles Shyer', 'Adam Bernardi']

Common Values

ValueCountFrequency (%)
[] 805
 
1.8%
['Georges Méliès'] 36
 
0.1%
['Christian I. Nyby II'] 13
 
< 0.1%
['Gerald Thomas', 'Talbot Rothwell'] 13
 
< 0.1%
['Frederick Wiseman'] 12
 
< 0.1%
['Charlie Chaplin', 'Charlie Chaplin'] 12
 
< 0.1%
['JP Siili', 'JP Siili'] 10
 
< 0.1%
['Stan Brakhage'] 10
 
< 0.1%
['James Benning'] 10
 
< 0.1%
['William K.L. Dickson ', 'William Heise'] 9
 
< 0.1%
Other values (42889) 44608
97.9%

Length

2023-06-10T19:56:28.494265image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john 9989
 
1.0%
david 8649
 
0.9%
michael 8201
 
0.8%
robert 6732
 
0.7%
james 4997
 
0.5%
paul 4511
 
0.5%
peter 4495
 
0.5%
richard 4352
 
0.4%
mark 4192
 
0.4%
william 3943
 
0.4%
Other values (89565) 924021
93.9%

Most occurring characters

ValueCountFrequency (%)
938608
 
11.6%
' 923966
 
11.4%
a 555048
 
6.8%
e 554988
 
6.8%
r 431980
 
5.3%
n 427785
 
5.3%
, 417397
 
5.1%
i 399603
 
4.9%
o 352475
 
4.3%
l 293807
 
3.6%
Other values (349) 2824040
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4697405
57.9%
Other Punctuation 1381886
 
17.0%
Uppercase Letter 1000334
 
12.3%
Space Separator 938608
 
11.6%
Open Punctuation 45549
 
0.6%
Close Punctuation 45549
 
0.6%
Dash Punctuation 10091
 
0.1%
Other Letter 206
 
< 0.1%
Decimal Number 51
 
< 0.1%
Final Punctuation 10
 
< 0.1%
Other values (4) 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 555048
11.8%
e 554988
11.8%
r 431980
9.2%
n 427785
 
9.1%
i 399603
 
8.5%
o 352475
 
7.5%
l 293807
 
6.3%
s 220274
 
4.7%
t 215697
 
4.6%
h 169949
 
3.6%
Other values (128) 1075799
22.9%
Other Letter
ValueCountFrequency (%)
ا 9
 
4.4%
8
 
3.9%
7
 
3.4%
م 7
 
3.4%
6
 
2.9%
6
 
2.9%
5
 
2.4%
د 5
 
2.4%
ی 4
 
1.9%
4
 
1.9%
Other values (88) 145
70.4%
Uppercase Letter
ValueCountFrequency (%)
M 90569
 
9.1%
S 83805
 
8.4%
J 74017
 
7.4%
B 68760
 
6.9%
C 66188
 
6.6%
R 60184
 
6.0%
A 59627
 
6.0%
D 56919
 
5.7%
L 50177
 
5.0%
G 48783
 
4.9%
Other values (80) 341305
34.1%
Other Punctuation
ValueCountFrequency (%)
' 923966
66.9%
, 417397
30.2%
. 40081
 
2.9%
\ 384
 
< 0.1%
" 38
 
< 0.1%
& 8
 
< 0.1%
! 4
 
< 0.1%
/ 3
 
< 0.1%
: 2
 
< 0.1%
· 2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 16
31.4%
0 12
23.5%
9 7
13.7%
8 5
 
9.8%
3 4
 
7.8%
7 3
 
5.9%
2 2
 
3.9%
1 2
 
3.9%
Open Punctuation
ValueCountFrequency (%)
[ 45538
> 99.9%
( 11
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
] 45538
> 99.9%
) 11
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 10088
> 99.9%
3
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
8
80.0%
2
 
20.0%
Nonspacing Mark
ValueCountFrequency (%)
́ 2
50.0%
̃ 2
50.0%
Space Separator
ValueCountFrequency (%)
938608
100.0%
Initial Punctuation
ValueCountFrequency (%)
2
100.0%
Math Symbol
ValueCountFrequency (%)
| 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5696699
70.2%
Common 2421749
29.8%
Cyrillic 1006
 
< 0.1%
Hangul 133
 
< 0.1%
Arabic 52
 
< 0.1%
Greek 33
 
< 0.1%
Han 21
 
< 0.1%
Inherited 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 555048
 
9.7%
e 554988
 
9.7%
r 431980
 
7.6%
n 427785
 
7.5%
i 399603
 
7.0%
o 352475
 
6.2%
l 293807
 
5.2%
s 220274
 
3.9%
t 215697
 
3.8%
h 169949
 
3.0%
Other values (145) 2075093
36.4%
Hangul
ValueCountFrequency (%)
8
 
6.0%
7
 
5.3%
6
 
4.5%
6
 
4.5%
5
 
3.8%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
Other values (58) 81
60.9%
Cyrillic
ValueCountFrequency (%)
и 116
 
11.5%
а 92
 
9.1%
р 72
 
7.2%
о 66
 
6.6%
е 58
 
5.8%
л 56
 
5.6%
к 54
 
5.4%
н 54
 
5.4%
с 45
 
4.5%
в 44
 
4.4%
Other values (42) 349
34.7%
Common
ValueCountFrequency (%)
938608
38.8%
' 923966
38.2%
, 417397
17.2%
[ 45538
 
1.9%
] 45538
 
1.9%
. 40081
 
1.7%
- 10088
 
0.4%
\ 384
 
< 0.1%
" 38
 
< 0.1%
5 16
 
< 0.1%
Other values (22) 95
 
< 0.1%
Greek
ValueCountFrequency (%)
ς 4
 
12.1%
η 3
 
9.1%
α 3
 
9.1%
Γ 2
 
6.1%
Α 2
 
6.1%
ρ 2
 
6.1%
ι 2
 
6.1%
ά 2
 
6.1%
μ 2
 
6.1%
Φ 1
 
3.0%
Other values (10) 10
30.3%
Arabic
ValueCountFrequency (%)
ا 9
17.3%
م 7
13.5%
د 5
9.6%
ی 4
7.7%
ع 4
7.7%
ي 4
7.7%
ن 3
 
5.8%
ح 3
 
5.8%
ل 3
 
5.8%
و 2
 
3.8%
Other values (7) 8
15.4%
Han
ValueCountFrequency (%)
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
1
 
4.8%
1
 
4.8%
Other values (3) 3
14.3%
Inherited
ValueCountFrequency (%)
́ 2
50.0%
̃ 2
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8090386
99.6%
None 28073
 
0.3%
Cyrillic 1006
 
< 0.1%
Hangul 133
 
< 0.1%
Arabic 52
 
< 0.1%
CJK 21
 
< 0.1%
Punctuation 15
 
< 0.1%
Latin Ext Additional 7
 
< 0.1%
Diacriticals 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
938608
 
11.6%
' 923966
 
11.4%
a 555048
 
6.9%
e 554988
 
6.9%
r 431980
 
5.3%
n 427785
 
5.3%
, 417397
 
5.2%
i 399603
 
4.9%
o 352475
 
4.4%
l 293807
 
3.6%
Other values (67) 2794729
34.5%
None
ValueCountFrequency (%)
é 7417
26.4%
á 3259
11.6%
í 2067
 
7.4%
ó 1796
 
6.4%
ö 1661
 
5.9%
ô 1407
 
5.0%
ü 964
 
3.4%
è 902
 
3.2%
ç 839
 
3.0%
ä 776
 
2.8%
Other values (113) 6985
24.9%
Cyrillic
ValueCountFrequency (%)
и 116
 
11.5%
а 92
 
9.1%
р 72
 
7.2%
о 66
 
6.6%
е 58
 
5.8%
л 56
 
5.6%
к 54
 
5.4%
н 54
 
5.4%
с 45
 
4.5%
в 44
 
4.4%
Other values (42) 349
34.7%
Arabic
ValueCountFrequency (%)
ا 9
17.3%
م 7
13.5%
د 5
9.6%
ی 4
7.7%
ع 4
7.7%
ي 4
7.7%
ن 3
 
5.8%
ح 3
 
5.8%
ل 3
 
5.8%
و 2
 
3.8%
Other values (7) 8
15.4%
Hangul
ValueCountFrequency (%)
8
 
6.0%
7
 
5.3%
6
 
4.5%
6
 
4.5%
5
 
3.8%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
4
 
3.0%
Other values (58) 81
60.9%
Punctuation
ValueCountFrequency (%)
8
53.3%
3
 
20.0%
2
 
13.3%
2
 
13.3%
Latin Ext Additional
ValueCountFrequency (%)
5
71.4%
1
 
14.3%
1
 
14.3%
CJK
ValueCountFrequency (%)
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
2
9.5%
1
 
4.8%
1
 
4.8%
Other values (3) 3
14.3%
Diacriticals
ValueCountFrequency (%)
́ 2
50.0%
̃ 2
50.0%

Interactions

2023-06-10T19:56:00.635869image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:40.438463image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:44.281018image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:47.417813image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:50.611497image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:53.681493image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:57.300956image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:01.160691image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:41.523380image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:44.756237image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:47.900839image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:51.065416image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:54.176588image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:57.797944image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:01.582883image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:41.970590image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:45.179342image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:48.343830image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:51.463382image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:54.992369image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:58.246564image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:02.042022image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:42.418848image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:45.599212image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:48.778341image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:51.935421image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:55.459129image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:58.679162image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:02.433678image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:42.847835image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:46.051836image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:49.198893image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:52.332455image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:55.892095image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:59.133505image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:02.915794image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:43.347246image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:46.522946image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:49.666750image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:52.793329image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:56.379847image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:59.647208image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:03.341093image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:43.829908image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:46.976765image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:50.144628image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:53.209716image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:55:56.830879image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-10T19:56:00.181481image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Missing values

2023-06-10T19:56:04.411893image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-10T19:56:06.008320image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-06-10T19:56:09.104511image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityposter_pathproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturncastcrew
0['Toy Story Collection']30000000.0['Animation', 'Comedy', 'Family']862enLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.21.946943/rhIRbceoE9lR4veEXuwCC2wARtG.jpg['Pixar Animation Studios']['United States of America']1995-10-30373554033.081.0['English']ReleasedNaNToy Story7.75415.01995.012.451801['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Jim Varney', 'Wallace Shawn', 'John Ratzenberger', 'Annie Potts', 'John Morris', 'Erik von Detten', 'Laurie Metcalf', 'R. Lee Ermey', 'Sarah Freeman', 'Penn Jillette']['John Lasseter', 'Joss Whedon', 'Andrew Stanton', 'Joel Cohen', 'Alec Sokolow', 'Bonnie Arnold', 'Ed Catmull', 'Ralph Guggenheim', 'Steve Jobs', 'Lee Unkrich', 'Ralph Eggleston', 'Robert Gordon', 'Mary Helen Leasman', 'Kim Blanchette', 'Marilyn McCoppen', 'Randy Newman', 'Dale E. Grahn', 'Robin Cooper', 'John Lasseter', 'Pete Docter', 'Joe Ranft', 'Patsy Bouge', 'Norm DeCarlo', 'Ash Brannon', 'Randy Newman', 'Roman Figun', 'Don Davis', 'James Flamberg', 'Mary Beth Smith', 'Rick Mackay', 'Susan Bradley', 'William Reeves', 'Randy Newman', 'Andrew Stanton', 'Pete Docter', 'Gary Rydstrom', 'Karen Robert Jackson', 'Chris Montan', 'Rich Quade', 'Michael Berenstein', 'Colin Brady', 'Davey Crockett Feiten', 'Angie Glocka', 'Rex Grignon', 'Tom K. Gurney', 'Jimmy Hayward', 'Hal T. Hickel', 'Karen Kiser', 'Anthony B. LaMolinara', 'Guionne Leroy', 'Bud Luckey', 'Les Major', 'Glenn McQueen', 'Mark Oftedal', 'Jeff Pidgeon', 'Jeff Pratt', 'Steve Rabatich', 'Roger Rose', 'Steve Segal', 'Doug Sheppeck', 'Alan Sperling', 'Doug Sweetland', 'David Tart', 'Ken Willard', 'Thomas Porter', 'Mark Thomas Henne', 'Oren Jacob', 'Darwyn Peachey', 'Mitch Prater', 'Brian M. Rosen', 'Sharon Calahan', 'Galyn Susman', 'William Cone', 'Shelley Daniels Lekven', 'Bob Pauley', 'Bud Luckey', 'Andrew Stanton', 'William Cone', 'Steve Johnson', 'Dan Haskett', 'Tom Holloway', 'Jean Gillmore', 'Desirée Mourad', 'Sonoko Konishi', 'Ann M. Rockwell', 'Julie M. McDonald', 'Robin Lee', 'Tom Freeman', 'Ada Cochavi', 'Dana Mulligan', 'Deirdre Morrison', 'Lori Lombardo', 'Ellen Devine', 'Lauren Beth Strogoff', 'Gary Rydstrom', 'Gary Summers', 'Tim Holland', 'Pat Jackson', 'Tom Myers', 'J.R. Grubbs', 'Susan Sanford', 'Susan Popovic', 'Dan Engstrom', 'Ruth Lambert', 'Mickie McGowan']
1NaN65000000.0['Adventure', 'Fantasy', 'Family']8844enWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.17.015539/vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg['TriStar Pictures', 'Teitler Film', 'Interscope Communications']['United States of America']1995-12-15262797249.0104.0['English', 'Français']ReleasedRoll the dice and unleash the excitement!Jumanji6.92413.01995.04.043035['Robin Williams', 'Jonathan Hyde', 'Kirsten Dunst', 'Bradley Pierce', 'Bonnie Hunt', 'Bebe Neuwirth', 'David Alan Grier', 'Patricia Clarkson', 'Adam Hann-Byrd', 'Laura Bell Bundy', 'James Handy', 'Gillian Barber', 'Brandon Obray', 'Cyrus Thiedeke', 'Gary Joseph Thorup', 'Leonard Zola', 'Lloyd Berry', 'Malcolm Stewart', 'Annabel Kershaw', 'Darryl Henriques', 'Robyn Driscoll', 'Peter Bryant', 'Sarah Gilson', 'Florica Vlad', 'June Lion', 'Brenda Lockmuller']['Larry J. Franco', 'Jonathan Hensleigh', 'James Horner', 'Joe Johnston', 'Robert Dalva', 'Nancy Foy', 'Kyle Balda', 'James D. Bissell', 'Scott Kroopf', 'Ted Field', 'Robert W. Cort', 'Thomas E. Ackerman', 'Chris van Allsburg', 'William Teitler', 'Greg Taylor', 'Jim Strain']
2['Grumpy Old Men Collection']0.0['Romance', 'Comedy']15602enA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.11.7129/6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg['Warner Bros.', 'Lancaster Gate']['United States of America']1995-12-220.0101.0['English']ReleasedStill Yelling. Still Fighting. Still Ready for Love.Grumpier Old Men6.592.01995.0NaN['Walter Matthau', 'Jack Lemmon', 'Ann-Margret', 'Sophia Loren', 'Daryl Hannah', 'Burgess Meredith', 'Kevin Pollak']['Howard Deutch', 'Mark Steven Johnson', 'Mark Steven Johnson', 'Jack Keller']
3NaN16000000.0['Comedy', 'Drama', 'Romance']31357enCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.3.859495/16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg['Twentieth Century Fox Film Corporation']['United States of America']1995-12-2281452156.0127.0['English']ReleasedFriends are the people who let you be yourself... and never let you forget it.Waiting to Exhale6.134.01995.05.090760['Whitney Houston', 'Angela Bassett', 'Loretta Devine', 'Lela Rochon', 'Gregory Hines', 'Dennis Haysbert', 'Michael Beach', 'Mykelti Williamson', 'Lamont Johnson', 'Wesley Snipes']['Forest Whitaker', 'Ronald Bass', 'Ronald Bass', 'Ezra Swerdlow', 'Deborah Schindler', 'Terry McMillan', 'Terry McMillan', 'Terry McMillan', 'Kenneth Edmonds', 'Caron K']
4['Father of the Bride Collection']0.0['Comedy']11862enJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.8.387519/e64sOI48hQXyru7naBFyssKFxVd.jpg['Sandollar Productions', 'Touchstone Pictures']['United States of America']1995-02-1076578911.0106.0['English']ReleasedJust When His World Is Back To Normal... He's In For The Surprise Of His Life!Father of the Bride Part II5.7173.01995.0inf['Steve Martin', 'Diane Keaton', 'Martin Short', 'Kimberly Williams-Paisley', 'George Newbern', 'Kieran Culkin', 'BD Wong', 'Peter Michael Goetz', 'Kate McGregor-Stewart', 'Jane Adams', 'Eugene Levy', 'Lori Alan']['Alan Silvestri', 'Elliot Davis', 'Nancy Meyers', 'Nancy Meyers', 'Albert Hackett', 'Charles Shyer', 'Adam Bernardi']
5NaN60000000.0['Action', 'Crime', 'Drama', 'Thriller']949enObsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence.17.924927/zMyfPUelumio3tiDKPffaUpsQTD.jpg['Regency Enterprises', 'Forward Pass', 'Warner Bros.']['United States of America']1995-12-15187436818.0170.0['English', 'Español']ReleasedA Los Angeles Crime SagaHeat7.71886.01995.03.123947['Al Pacino', 'Robert De Niro', 'Val Kilmer', 'Jon Voight', 'Tom Sizemore', 'Diane Venora', 'Amy Brenneman', 'Ashley Judd', 'Mykelti Williamson', 'Natalie Portman', 'Ted Levine', 'Tom Noonan', 'Tone Loc', 'Hank Azaria', 'Wes Studi', 'Dennis Haysbert', 'Danny Trejo', 'Henry Rollins', 'William Fichtner', 'Kevin Gage', 'Susan Traylor', 'Jerry Trimble', 'Ricky Harris', 'Jeremy Piven', 'Xander Berkeley', 'Begonya Plaza', 'Rick Avery', 'Hazelle Goodman', 'Ray Buktenica', 'Max Daniels', 'Vince Deadrick Jr.', 'Steven Ford', 'Farrah Forke', 'Patricia Healy', 'Paul Herman', 'Cindy Katz', 'Brian Libby', 'Dan Martin', 'Mario Roberts', 'Thomas Rosales, Jr.', 'Yvonne Zima', 'Mick Gould', 'Bud Cort', 'Viviane Vives', 'Kim Staunton', 'Martin Ferrero', 'Brad Baldridge', 'Andrew Camuccio', 'Kenny Endoso', 'Kimberly Flynn', 'Niki Harris', 'Bill McIntosh', 'Rick Marzan', 'Terry Miller', 'Kai Soremekun', 'Peter Blackwell', 'Trevor Coppola', 'Mary Kircher', 'Darin Mangan', 'Robert Miranda', 'Manny Perry', 'Iva Franks Singer', 'Tim Werner', 'Philip Ettington']['Michael Mann', 'Michael Mann', 'Art Linson', 'Michael Mann', 'Elliot Goldenthal', 'Dante Spinotti', 'Pasquale Buba', 'William Goldenberg', 'Dov Hoenig', 'Tom Rolf', 'Bonnie Timmermann', 'Neil Spisak', 'Margie Stone McShirley', 'Deborah Lynn Scott', 'Bill Abbott', 'Per Hallberg', 'Terry D. Frazee', 'Paul H. Haines Jr.', 'Neil Krepela', 'Joel Kramer', 'Tony Brubaker', 'Anne H. Ahrens', 'Darryl M. Athons', 'Cate Hardman', 'Jane Brody', 'Donald Frazee', 'Oscar Mazzola', 'Dianne Wager', 'Anthony Lattanzio', 'David Le Vey', 'Leonard Engelman', 'Ilona Herman', 'Vera Mitchell', 'John Caglione Jr.', 'Ken Diaz', 'Neal J. Anderson', 'Duncan Burns', 'Hector C. Gika', 'Larry Kemp', 'Lauren Stephens', 'Gary Jay', 'James Muro', 'Frank Connor', 'Duane Manwiller', 'Chris Moseley', 'Frank Dorowsky', 'Michael Connell', 'Budd Carr', 'Matthew Booth', 'Vicki Hiatt', 'Thomas R. Bryant', 'Ray Boniker', 'Anna Behlmer', 'Ron Bartlett', 'Chris Jenkins', 'Andy Nelson', 'Mark Smith', 'Mick Gould', 'Tim Werner', 'Pieter Jan Brugge', 'Gusmano Cesaretti', 'Arnon Milchan', 'Christopher Cronyn', 'Michael Waxman', 'Alison E. McBryde', 'Marsha Bozeman', 'Jeff Wells', 'Doug Coleman', 'Philip Rogers', 'Jimmy Webb']
6NaN58000000.0['Comedy', 'Romance']11860enAn ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it.6.677277/jQh15y5YB7bWz1NtffNZmRw0s9D.jpg['Paramount Pictures', 'Scott Rudin Productions', 'Mirage Enterprises', 'Sandollar Productions', 'Constellation Entertainment', 'Worldwide', 'Mont Blanc Entertainment GmbH']['Germany', 'United States of America']1995-12-150.0127.0['Français', 'English']ReleasedYou are cordially invited to the most surprising merger of the year.Sabrina6.2141.01995.00.000000['Harrison Ford', 'Julia Ormond', 'Greg Kinnear', 'Angie Dickinson', 'Nancy Marchand', 'John Wood', 'Richard Crenna', 'Lauren Holly', 'Dana Ivey', 'Fanny Ardant', 'Patrick Bruel', 'Paul Giamatti', 'Miriam Colón', 'Elizabeth Franz', 'Valérie Lemercier', 'Becky Ann Baker', 'John C. Vennema', 'Margo Martindale', 'J. Smith-Cameron', 'Christine Luneau-Lipton', 'Michael Dees', 'Denis Holmes', 'Jo-Jo Lowe', 'Ira Wheeler', 'Philippa Cooper', 'Ayako Kawahara', 'François Genty', 'Guillaume Gallienne', 'Inés Sastre', 'Phina Oruche', 'Andrea Behalikova', 'Jennifer Herrera', 'Kristina Kumlin', 'Eva Linderholm', 'Carmen Chaplin', 'Micheline Van de Velde', 'Joanna Rhodes', 'Alan Boone', 'Patrick Forster-Delmas', 'Kentaro Matsuo', 'Peter McKernan', 'Ed Connelly', 'Ronald L. Schwary', 'Alvin Lum', 'Siching Song', 'Phil Nee', 'Randy Becker', 'Susan Browning', 'Anthony Mondal', 'Peter Parks', 'Woodrow Asai', 'Eric Bruno Borgman', 'Michael Cline', 'Christopher Del Gaudio', 'Philippe Hartmann', 'Jerry Quinn', 'Dori Rosenthal']['Sydney Pollack', 'Barbara Benedek', 'Sydney Pollack', 'John Williams', 'Fredric Steinkamp', 'Scott Rudin', 'David Rubin', 'Brian Morris', 'David Rayfiel', 'Peter Robb-King', 'Bernadette Mazur', 'Joseph A. Campayno', 'Lynda Gurasich', 'Stephen G. Bishop', 'Gary Jones', 'Ann Roth', 'George DeTitta Jr.', 'Amy Marshall', 'Miriam Schapiro', 'Danny Michael', 'Adam Jenkins', 'Chris Jenkins', 'Scott Millan', 'Myron Nettinga', 'Mitch Gettleman', 'Joe Earle', 'J. Paul Huntsman', 'Andrew Schmetterling', 'Adam Sawelson', 'Barbara Issak', 'Benjamin Beardwood', 'Mary A. Kelly', 'Myles Aronowitz', 'Brian Hamill', 'Giovanni Fiore Coltellacci', 'Giuseppe Rotunno', 'Kate Dowd', 'Juliet Polcsa', 'Michelle Matland', 'Donna Maloney', 'Karl F. Steinkamp', 'Lindsay Doran', 'Ronald L. Schwary', 'John Kasarda', 'Jean-Pierre Avice', 'Thomas A. Imperato', 'Ronald L. Schwary', 'Bill Kaufman', 'Ronna Kress', 'Sandrine Ageorges', 'Joseph E. Iberti', 'Joanny Carpentier', 'Katherine Kennedy']
7NaN0.0['Action', 'Adventure', 'Drama', 'Family']45325enA mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence.2.561161/sGO5Qa55p7wTu7FJcX4H4xIVKvS.jpg['Walt Disney Pictures']['United States of America']1995-12-220.097.0['English', 'Deutsch']ReleasedThe Original Bad Boys.Tom and Huck5.445.01995.0NaN['Jonathan Taylor Thomas', 'Brad Renfro', 'Rachael Leigh Cook', 'Michael McShane', 'Amy Wright', 'Eric Schweig', 'Tamara Mello']['David Loughery', 'Stephen Sommers', 'Peter Hewitt', 'Mark Twain']
8NaN35000000.0['Action', 'Adventure', 'Thriller']9091enInternational action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer...5.23158/eoWvKD60lT95Ss1MYNgVExpo5iU.jpg['Universal Pictures', 'Imperial Entertainment', 'Signature Entertainment']['United States of America']1995-12-2264350171.0106.0['English']ReleasedTerror goes into overtime.Sudden Death5.5174.01995.01.838576['Jean-Claude Van Damme', 'Powers Boothe', 'Dorian Harewood', 'Raymond J. Barry', 'Ross Malinger', 'Whittni Wright']['Peter Hyams', 'Karen Elise Baldwin', 'Gene Quintano', 'Moshe Diamant', 'Anders P. Jensen', 'Howard Baldwin', 'John Debney', 'Peter Hyams', 'Steven Kemper']
9['James Bond Collection']58000000.0['Adventure', 'Action', 'Thriller']710enJames Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain.14.686036/5c0ovjT41KnYIHYuF4AWsTe3sKh.jpg['United Artists', 'Eon Productions']['United Kingdom', 'United States of America']1995-11-16352194034.0130.0['English', 'Pусский', 'Español']ReleasedNo limits. No fears. No substitutes.GoldenEye6.61194.01995.06.072311['Pierce Brosnan', 'Sean Bean', 'Izabella Scorupco', 'Famke Janssen', 'Joe Don Baker', 'Judi Dench', 'Gottfried John', 'Robbie Coltrane', 'Alan Cumming', 'Tchéky Karyo', 'Desmond Llewelyn', 'Samantha Bond', 'Michael Kitchen', 'Serena Gordon', 'Simon Kunz', 'Billy J. Mitchell', 'Constantine Gregory', 'Minnie Driver', 'Michelle Arthur', 'Ravil Isyanov']['Martin Campbell', 'Ian Fleming', 'Jeffrey Caine', 'Bruce Feirstein', 'Barbara Broccoli', 'Tom Pevsner', 'Eric Serra', 'Tina Turner', 'Phil Meheux', 'Terry Rawlings', 'Debbie McWilliams', 'Peter Lamont', 'Andrew Ackland-Snow', 'Kathrin Brunner', 'Charles Dwight Lee', 'Michael Ford', 'Lindy Hemming', 'Michael G. Wilson', 'Anthony Waye', 'Michael France', 'Michael Boone', 'Steven Lawrence', 'Tony Graysmark', 'Neil Lamont', 'Pam Dixon', 'Robert Hathaway', 'Charles Bodycomb', 'June Randall', 'Harvey Harrison', 'Roger Pearce', 'Herbert Raditschnig', 'Tim Wooster', 'Keith Hamshere', 'George Whitear', 'Bill Pochetty', 'Luigi Bisioli', 'Steve Foster', 'Chris Corbould', 'Mara Bryan', 'Tim Grover', 'Peter Musgrave', 'Michael A. Carter', 'Graham V. Hartstone', 'John Hayward', 'Jim Shields', 'David John']
belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityposter_pathproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturncastcrew
45532NaN0.0['Horror', 'Mystery', 'Thriller']84419enAn unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics.0.222814/yMnq9mL5uYxbRgwKqyz1cVGCJYJ.jpg['Universal Pictures']['United States of America']1946-03-290.065.0['English']ReleasedMeet...The CREEPER!House of Horrors6.38.01946.0NaN['Rondo Hatton', 'Robert Lowery', 'Virginia Grey', 'Bill Goodwin', 'Martin Kosleck', 'Alan Napier', 'Howard Freeman', 'Virginia Christine', 'Joan Shawlee', 'Byron Foulger', 'Syd Saylor']['Russell A. Gausman', 'John B. Goodman', 'Jack P. Pierce', 'Philip Cahn', 'Jean Yarbrough', 'George Bricker', 'Maury Gertsman', 'Dwight V. Babcock', 'Ben Pivar', 'Abraham Grossman', 'Ralph Warrington']
45533NaN0.0['Mystery', 'Horror']390959enIn this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2".0.076061/q75tCM4pFmUzdCg0gqcOQquCaYf.jpg[][]2000-10-220.045.0['English']ReleasedNaNShadow of the Blair Witch7.02.02000.0NaN['Tony Abatemarco', 'Andre Brooks', 'Mariclare Costello', 'Bill Dreggors', 'Apollo Dukakis', 'Philip Friedman', 'James Gleason', 'Dilva Henry', 'Bari Hochwald', 'Wendy Hoffman', 'John Huck', 'Rachel Moskowitz', 'Sandy Mulvihill', 'Roger Nolan', 'Chris Parnell', 'Byrne Piven', 'Richard Sexton', 'Rich Williams', 'Ray Xifo']['Ben Rock', 'Ben Rock', 'Jay Bogdanowitsch', 'Pirie Jones', 'Kimberly Rach', 'Ben Rock', 'Sasha Bogdanowitsch', 'Neal Fredericks', 'George Rizkallah', 'Eddie Dunlop', 'David Giella', 'Steven P. Duchscherer', 'Chris Davis', 'Kimberly Eckhout', 'Noelle Polard', 'Noelle Polard', 'Kimberly Eckhout', 'Hillary Wallace', 'Hillary Wallace', 'Craig Borden', 'Renelouise Smith', 'Aaron Walters', 'Shaun Richkind', 'Jeremy M. Gilleece', 'Jeremy M. Gilleece', 'Jackson Hilliard', 'James Grossman', 'Dale Obert', 'Ann Roth']
45534NaN0.0['Horror']289923enA film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch.0.38645/lXtoHVdej6kS1Dc7KAhw05sMos9.jpg['Neptune Salad Entertainment', 'Pirie Productions']['United States of America']2000-10-030.030.0['English']ReleasedDo you know what happened 50 years before "The Blair Witch Project"?The Burkittsville 77.01.02000.0NaN['Monty Bane', 'Lucy Butler', 'David Grammer', 'Bill Dreggors', 'Frank Pastor', 'Heather Donahue', 'Joshua Leonard', 'Michael C. Williams']['Ben Rock', 'Ben Rock']
45535NaN0.0['Science Fiction']222848enIt's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong; wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war; where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy!0.661558/4lF9LH0b0Z1X94xGK9IOzqEW6k1.jpg['Concorde-New Horizons']['United States of America']1995-01-010.085.0['English']ReleasedNaNCaged Heat 30003.51.01995.0NaN['Lisa Boyle', 'Kena Land', 'Zaneta Polard', 'Don Yanan', 'Debra K. Beatty', 'Mark Sikes', 'Robert J. Ferrelli', 'Ellyn Dawn Humphreys', 'Ron Jeremy', 'Ben Ramsey']['Roger Corman', 'Mike Elliott', 'Aaron Osborne', 'Mike Upton', 'Emile Dupont', 'Felix Chamberlain']
45536NaN0.0['Drama', 'Action', 'Romance']30840enYet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone.5.683753/fQC46NglNiEMZBv5XHoyLuOWoN5.jpg['Westdeutscher Rundfunk (WDR)', 'Working Title Films', '20th Century Fox Television', 'CanWest Global Communications']['Canada', 'Germany', 'United Kingdom', 'United States of America']1991-05-130.0104.0['English']ReleasedNaNRobin Hood5.726.01991.0NaN['Patrick Bergin', 'Uma Thurman', 'David Morrissey', 'Jürgen Prochnow', 'Jeroen Krabbé']['John Irvin', 'Sam Resnick', 'John McGrath', 'Sam Resnick', 'Sarah Radclyffe', 'Geoffrey Burgon', 'Jason Lehel', 'Peter Tanner', 'Susie Figgis']
45537NaN0.0['Drama', 'Family']439050faRising and falling between a man and woman.0.072051/jldsYflnId4tTWPx8es3uzsB1I8.jpg[]['Iran']NaN0.090.0['فارسی']ReleasedRising and falling between a man and womanSubdue4.01.0NaNNaN['Leila Hatami', 'Kourosh Tahami', 'Elham Korda']['Hamid Nematollah', 'Hamid Nematollah', 'Farshad Mohammadi', 'Masoumeh Bayat', 'Mehdi Saadi', 'Babak Ardalan', 'Azadeh Ghavam', 'Sahand Torabi', 'Homayoun Shajarian']
45538NaN0.0['Drama']111109tlAn artist struggles to finish his work while a storyline about a cult plays in his head.0.178241/xZkmxsNmYXJbKVsTRLLx3pqGHx7.jpg['Sine Olivia']['Philippines']2011-11-170.0360.0['']ReleasedNaNCentury of Birthing9.03.02011.0NaN['Angel Aquino', 'Perry Dizon', 'Hazel Orencio', 'Joel Torre', 'Bart Guingona', 'Soliman Cruz ', 'Roeder', 'Angeli Bayani', 'Dante Perez', 'Betty Uy-Regala', 'Modesta']['Lav Diaz', 'Lav Diaz', 'Dante Perez', 'Lav Diaz', 'Lav Diaz', 'Lav Diaz']
45539NaN0.0['Action', 'Drama', 'Thriller']67758enWhen one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ...0.903007/d5bX92nDsISNhu3ZT69uHwmfCGw.jpg['American World Pictures']['United States of America']2003-08-010.090.0['English']ReleasedA deadly game of wits.Betrayal3.86.02003.0NaN['Erika Eleniak', 'Adam Baldwin', 'Julie du Page', 'James Remar', 'Damian Chapa', 'Louis Mandylor', 'Tom Wright', 'Jeremy Lelliott', 'James Quattrochi', 'Jason Widener', 'Joe Sabatino', 'Kiko Ellsworth', 'Don Swayze', 'Peter Dobson', 'Darrell Dubovsky']['Mark L. Lester', 'C. Courtney Joyner', 'Jeffrey Goldenberg', 'Richard McHugh', 'João Fernandes']
45540NaN0.0[]227506enIn a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship.0.003503/aorBPO7ak8e8iJKT5OcqYxU3jlK.jpg['Yermoliev']['Russia']1917-10-210.087.0[]ReleasedNaNSatan Triumphant0.00.01917.0NaN['Iwan Mosschuchin', 'Nathalie Lissenko', 'Pavel Pavlov', 'Aleksandr Chabrov', 'Vera Orlova']['Yakov Protazanov', 'Joseph N. Ermolieff']
45541NaN0.0[]461257en50 years after decriminalisation of homosexuality in the UK, director Daisy Asquith mines the jewels of the BFI archive to take us into the relationships, desires, fears and expressions of gay men and women in the 20th century.0.163015/s5UkZt6NTsrS7ZF0Rh8nzupRlIU.jpg[]['United Kingdom']2017-06-090.075.0['English']ReleasedNaNQueerama0.00.02017.0NaN[]['Daisy Asquith']

Duplicate rows

Most frequently occurring

belongs_to_collectionbudgetgenresidoriginal_languageoverviewposter_pathproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturncastcrew# duplicates
34NaN0.0['Thriller', 'Mystery']141971fiRecovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia./8VSZ9coCzxOCW2wE2Qene1H1fKO.jpg['Filmiteollisuus Fine']['Finland']2008-12-260.0108.0['suomi']ReleasedWhich one is the first to return - memory or the murderer?Blackout6.73.02008.0NaN['Petteri Summanen', 'Ismo Kallio', 'Eppu Salminen', 'Irina Björklund', 'Hannu-Pekka Björkman', 'Jenni Banerjee', 'Mikko Leppilampi', 'Lena Meriläinen', 'Mari Perankoski', 'Risto Kaskilahti']['JP Siili', 'JP Siili']9
3['Pokémon Collection']0.0['Adventure', 'Fantasy', 'Animation', 'Science Fiction', 'Family']12600jaAll your favorite Pokémon characters are back, and are joined for the first time by the legendary Pokémon Celebi and Suicune, in this latest exciting Pokémon adventure! In order to escape a greedy Pokémon hunter, Celebi must use the last of its energy to travel through time to the present day. Celebi brings along Sammy, a boy who had been trying to protect it. Along with Ash, Pikachu, and the rest of the gang, Sammy and Celebi encounter an enemy far more advanced than the hunter left behind in the past. This new enemy possesses a Pokéball called a “Dark Ball,” which transforms the Pokémon it captures into evil and far stronger creatures. When Celebi is captured, the fate of the entire forest is threatened. Let POKÉMON 4EVER transport you to a world of adventure as Ash, Suicune and the rest take action to save the day!/bqL0PVHbQ8Jmw3Njcl38kW0CoeM.jpg[]['Japan', 'United States of America']2001-07-0628023563.075.0['日本語']ReleasedNaNPokémon 4Ever: Celebi - Voice of the Forest5.782.02001.0inf['Veronica Taylor', 'Rachael Lillis', 'Maddie Blaustein', 'Ikue Ōtani']['Hisao Shirai', 'Kunihiko Yuyama', 'Choji Yoshikawa', 'Norman J. Grossfeld', 'Alfred R. Kahn', 'Takashi Kawaguchi', 'Masakazu Kubo', 'Yukako Matsusako', 'Takemoto Mori', 'Jim Malone', 'Hideki Sonoda', 'Shinji Miyazaki', 'Yumiko Fuse', 'Toshio Henmi', 'Yutaka Henmi', 'Yutaka Ita', 'Yukiko Nojiri']4
11NaN0.0['Action', 'Horror', 'Science Fiction']18440enWhen a comet strikes Earth and kicks up a cloud of toxic dust, hundreds of humans join the ranks of the living dead. But there's bad news for the survivors: The newly minted zombies are hell-bent on eradicating every last person from the planet. For the few human beings who remain, going head to head with the flesh-eating fiends is their only chance for long-term survival. Yet their battle will be dark and cold, with overwhelming odds./tWCyKXHuSrQdLAvNeeVJBnhf1Yv.jpg[]['United States of America']2007-01-010.089.0['English']ReleasedNaNDays of Darkness5.05.02007.0NaN['Sabrina Gennarino', 'Tom Eplin']['Jake Kennedy', 'Jake Kennedy']4
12NaN0.0['Adventure', 'Animation', 'Drama', 'Action', 'Foreign']23305enIn feudal India, a warrior (Khan) who renounces his role as the longtime enforcer to a local lord becomes the prey in a murderous hunt through the Himalayan mountains./9GlrmbZO7VGyqhaSR1utinRJz3L.jpg['Filmfour']['France', 'Germany', 'India', 'United Kingdom']2001-09-230.086.0['हिन्दी']ReleasedNaNThe Warrior6.315.02001.0NaN['Irrfan Khan', 'Puru Chibber', 'Aino Annuddin', 'Manoj Mishra', 'Nanhe Khan', 'Chander Singh', 'Hemant Maahaor', 'Mandakini Goswami', 'Sunita Sharma', 'Shaukat Baig', 'Gori Shanker', 'Prabhuram', 'Wagaram', 'Ajai Rohilla', 'Noor Mani', 'Sitaram Panchal', 'Chander Prakash Vyas', 'Sanjal', 'Anupam Shyam', 'Amit Kumar', 'Damayanti Marfatia', 'Trilok Singh', 'Pushpa Negi', 'Karuna Sarah Davis', 'Rakesh Mehra', 'Anuradha Advanti', 'Ismail Bashey', 'Madhu']['Asif Kapadia', 'Asif Kapadia', 'Tim Miller']4
14NaN0.0['Comedy', 'Drama']11115enAs an ex-gambler teaches a hot-shot college kid some things about playing cards, he finds himself pulled into the world series of poker, where his protégé is his toughest competition./kHaBqrrozaG7rj6GJg3sUCiM29B.jpg['Andertainment Group', 'Crescent City Pictures', 'Tag Entertainment']['United States of America']2008-01-290.085.0['English']ReleasedNaNDeal5.222.02008.0NaN['Burt Reynolds', 'Bret Harrison', 'Shannon Elizabeth', 'Maria Mason', 'Jennifer Tilly', 'Gary Grubbs', 'Charles Durning', 'Caroline Mckinley', 'Brandon Ray Olive', 'Jon Eyez', 'J.D. Evermore']['Eric Strand', 'Peter Rafelson', 'Gil Cates Jr.', 'Gil Cates Jr.', 'Marc Weinstock', 'Tom Harting', 'Jonathan Cates', 'Frank Zito', 'Michael Amato', 'Scott Lazar', 'Albert J. Salzer', 'Marc Weinstock']4
15NaN0.0['Comedy', 'Drama']265189svWhile holidaying in the French Alps, a Swedish family deals with acts of cowardliness as an avalanche breaks out./rGMtc9AtZsnWSSL5VnLaGvx1PI6.jpg['Motlys', 'Coproduction Office', 'Film i Väst']['Norway', 'Sweden', 'France']2014-08-151359497.0118.0['Français', 'Norsk', 'svenska', 'English']ReleasedNaNForce Majeure6.8255.02014.0inf['Lisa Loven Kongsli', 'Johannes Bah Kuhnke', 'Clara Wettergren', 'Vincent Wettergren', 'Brady Corbet', 'Kristofer Hivju', 'Fanni Metelius', 'Karin Myrenberg', 'Johannes Moustos']['Ruben Östlund', 'Ruben Östlund', 'Philippe Bober', 'Erik Hemmendorff', 'Marie Kjellson', 'Katja Adomeit', 'Marina Perales', 'Yngve Sæther', 'Ola Fløttum', 'Fredrik Wenzel', 'Jacob Secher Schulsinger', 'Katja Wik', 'Josefin Åsberg', 'Josefin Åsberg', 'Pia Aleborg']4
16NaN0.0['Comedy']97995enAfter breaking a mirror in his home, superstitious Max tries to avoid situations which could bring bad luck but in doing so, causes himself the worst luck imaginable./4J6Ai4C5YRgfRUTlirrJ7QsmJKU.jpg['Max Linder Productions']['United States of America']1921-02-060.062.0['English']ReleasedNaNSeven Years Bad Luck5.64.01921.0NaN['Max Linder', 'Alta Allen', 'Ralph McCullough', 'Betty K. Peterson', 'F.B. Crayne', 'Chance Ward', 'Hugh Saxon', 'Thelma Percy', 'C.E. Anderson', 'Lola Gonzales', 'Harry Mann', 'Joe Martin']['Charles Van Enger', 'Max Linder', 'Max Linder', 'Max Linder']4
17NaN0.0['Crime', 'Drama', 'Thriller']5511frHitman Jef Costello is a perfectionist who always carefully plans his murders and who never gets caught./cvNW8IXigbaMNo4gKEIps0NGnhA.jpg['Fida cinematografica', 'Compagnie Industrielle et Commerciale Cinématographique (CICC)', 'TC Productions', 'Filmel']['France', 'Italy']1967-10-2539481.0105.0['Français']ReleasedThere is no solitude greater than that of the SamuraiLe Samouraï7.9187.01967.0inf['Alain Delon', 'François Périer', 'Nathalie Delon', 'Cathy Rosier', 'Catherine Jourdan', 'Jacques Leroy', 'Michel Boisrond', 'Robert Favart', 'Jean-Pierre Posier', 'Roger Fradet', 'Carlo Nell', 'Robert Rondo', 'André Salgues', 'André Thorent', 'Jacques Deschamps', 'Georges Casati', 'Jacques Léonard', 'Pierre Vaudier', 'Maurice Magalon', 'Gaston Meunier', 'Jean Gold', 'Georges Billy', 'Ari Aricardi', 'Guy Bonnafoux', 'Humberto Catalano', 'Carl Lechner', 'Maria Maneva']['Henri Decaë', 'Raymond Borderie', 'Jean-Pierre Melville', 'Jean-Pierre Melville', 'Jean-Pierre Melville', 'François de Roubaix', 'Georges Pellegrin', 'Georges Pellegrin', 'Eugène Lépicier', 'Monique Bonnot', 'Yolande Maurette', 'Joan McLeod']4
19NaN0.0['Documentary']84198enUsing personal stories, this powerful documentary illuminates the plight of the 49 million Americans struggling with food insecurity. A single mother, a small-town policeman and a farmer are among those for whom putting food on the table is a daily battle./jn8L1QdWWX5c0NUOLjzaSXtZrbt.jpg[]['United States of America']2012-03-220.084.0['English']ReleasedOne Nation. Underfed.A Place at the Table6.97.02012.0NaN['Jeff Bridges', 'Tom Colicchio', 'Mariana Chilton', 'Ken Cook', 'Barbie Izquierdo', 'James McGovern', 'Marion Nestle', 'Raj Patel', 'Janet Poppendieck']['Kristi Jacobson', 'Lori Silverbush']4
21NaN0.0['Drama', 'Comedy']168538enIn Zola's Paris, an ingenue arrives at a tony bordello: she's Nana, guileless, but quickly learning to use her erotic innocence to get what she wants. She's an actress for a soft-core filmmaker and soon is the most popular courtesan in Paris, parlaying this into a house, bought for her by a wealthy banker. She tosses him and takes up with her neighbor, a count of impeccable rectitude, and with the count's impressionable son. The count is soon fetching sticks like a dog and mortgaging his lands to satisfy her whims./pg4PUHRFrgNfACHSh5MITQ2gYch.jpg['Cannon Group', 'Metro-Goldwyn-Mayer (MGM)'][]1983-06-130.092.0[]ReleasedNaNNana, the True Key of Pleasure4.73.01983.0NaN['Katya Berger', 'Jean-Pierre Aumont', 'Yehuda Efroni', 'Yehuda Efroni', 'Massimo Serato', 'Debra Berger', 'Shirin Taylor', 'Annie Belle', 'Paul Müller', 'Marcus Beresford', 'Robert Bridges', 'Tom Felleghy']['Marc Behm', 'Émile Zola', 'Dan Wolman']4